报告题目:Continuous time q-learning for mean-field control problems
时 间:2024年3月22日(星期五)上午11:00
地 点:科研楼18号楼1102
主 办:数学与统计学院
参加对象:感兴趣的老师和学生
报告摘要:In this talk, we study q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for mean-field control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function that is employed in policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two financial application examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments. This is a joint work with Xiaoli Wei (Harbin Institute of Technology).
报告人简介:余翔,香港理工大学应用数学学院副教授、博导。余翔博士于2007年本科毕业于华中科技大学,2012年毕业于德克萨斯大学(奥斯汀分校),2012-2015年任职于密歇根大学数学系(访问助理教授),2015年起任职于香港理工大学应用数学学院(历任助理教授,副教授)。余博士的主要研究兴趣有金融数学、应用概率、随机分析、随机控制与优化。他近期的研究工作主要涉及轨道依赖的投资和消费问题、两层平均场博弈问题、时间不一致的最优停止问题、新模型下的最优分红问题、基于机器学习方法的最优控制优化问题,等。余博士的研究工作主要发表在《Mathematical Finance》、《Finance and Stochastics》、《Annals of Applied Probability》、《Mathematics of Operations Research》和《SIAM Journal on Control and Optimization》等期刊上。