报告题目:Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise
时 间:2021-07-22 09:00-11:00
地 点:腾讯会议(ID: 749 880 554)
主 办:数学与统计学院
参加对象:统计系老师与学生
报告摘要:Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the theory is considerably less developed in the context of deep learning where the problem is non-convex and the gradient noise might exhibit a heavy-tailed behavior, as empirically observed in recent studies. In this study, we consider a continuous-time variant of SGDm, known as the underdamped Langevin dynamics (ULD), and investigate its asymptotic properties under heavy-tailed perturbations. Supported by recent studies from statistical physics, we argue both theoretically and empirically that the heavy-tails of such perturbations can result in a bias even when the step-size is small, in the sense that the optima of stationary distribution of the dynamics might not match the optima of the cost function to be optimized. As a remedy, we develop a novel framework, which we coin as fractional ULD (FULD), and prove that FULD targets the so-called Gibbs distribution, whose optima exactly match the optima of the original cost. We support our theory with experiments conducted on a synthetic model and neural networks. This is based on joint work with Umut Simsekli, Yee Whye Teh and Mert Gurbuzbalaban.
报告人简介:Lingjiong Zhu is a University Research Scholar, a professor at the Department of Mathematics, Florida State University, America. His research interests include financial mathematics and probability theory. He has published extensively in his areas of expertise in some international journals, such as Bernoulli, Applied Mathematical Finance and SIAM Journal on Financial Mathematics.