报告题目:Heavy-Tailed Methods in Machine Learning: Heavy-Tail Phenomena and Fractional Langevin Algorithms
时 间:2025年6月19日(星期四)10:30
地 点:科研楼18号楼1102
主 办:数学与统计学院
参加对象:感兴趣的老师和学生
报告摘要:Heavy tails are observed empirically and shown theoretically that can help with the generalization performance of SGD. However, the origins of heavy tails are not clear. We claim that depending on the structure of the Hessian of the loss at the minimum, and the choices of the algorithm parameters, the SGD iterates will converge to a heavy-tailed stationary distribution. We rigorously prove this claim in the setting of quadratic optimization: we show that even in a simple linear regression problem with independent and identically distributed data whose distribution has finite moments of all order, the iterates can be heavy-tailed with infinite variance. We further characterize the behavior of the tails with respect to algorithm parameters, the dimension, and the curvature. Since empirically and theoretically heavy tails have been observed in SGD, we approximate SGD by Levy-driven SDE to study fractional Langevin algorithms. However, the heavy-tails of such perturbations can result in a bias even when the step-size is small, in the sense that the optima of stationary distribution of the dynamics might not match the optima of the cost function to be optimized. As a remedy, we modify the dynamics to retarget the Gibbs distribution. We support our theory with experiments.
报告人简介:朱凌炯,佛罗里达州立大学教授。朱凌炯教授于2008年获剑桥大学学士学位,2013年获纽约大学Courant数学研究所博士学位,师从著名数学家S.R.S. Varadhan。他毕业后曾任职于摩根士丹利与明尼苏达大学,2015年加入佛罗里达州立大学任助理教授,现为该校数学系教授,思考机器杰出学者。朱教授的研究领域涵盖应用概率论、数据科学、金融工程与运筹学,曾在各领域顶级期刊和会议Ann Appl Probab, Bernoulli, Financ Stoch, ICML, INFORMS J Comput, J Mach Learn Res, NeurIPS, Oper Res, Prod Oper Manag, SIAM J Financ Math, Stoch Proc Appl, Rev Econ Stat等发表论文,并获Courant研究所2013年度Kurt O. Friedrichs杰出博士论文奖、佛罗里达州立大学2022年发展学者奖、2023年研究生导师奖,以及2023年MSOM iFORM SIG最佳论文奖。
