南开大学邹长亮教授学术报告

科研楼18号楼1102

发布者:韩伟发布时间:2024-03-25浏览次数:313

报告题目:Optimal subsampling via predictive inference

时      间:2024328日(星期四)1600

地      点: 科研楼18楼1102

主      办:数学与统计学院、分析数学及应用教育部重点实验室、福建省分析数学及应用重点实验室、统计学与人工智能福建省高校重点实验室福建省应用数学中心(福建师范大学

参加对象感兴趣的老师和学生


报告摘要:In the big data era, subsampling or sub-data selection techniques are often adopted to extract a fraction of informative individuals from the massive data. Existing subsampling algorithms focus mainly on obtaining a representative subset to achieve the best estimation accuracy under a given class of models. We consider here a semi-supervised setting wherein a small or moderate sized “labeled” data is available in addition to a much larger sized “unlabeled” data. The goal is to sample from the unlabeled data with a given budget to obtain informative individuals that are characterized by their unobserved responses. We propose an optimal subsampling procedure that is able to maximize the diversity of the selected subsample and control the false selection rate (FSR) simultaneously, allowing us to explore reliable information as much as possible. The key ingredients of our method are the use of predictive inference for quantifying the uncertainty of response predictions and a reformulation of the objective into a constrained optimization problem. We show that the proposed method is asymptotically optimal in the sense that the diversity of the subsample converges to its oracle counterpart with FSR control.


报告人简介:邹长亮,南开大学统计与数据科学学院教授、统计研究院院长。2008年于南开大学获博士学位,随后留校任教。主要从事统计学及其与数据科学领域的交叉研究和实际应用。研究兴趣包括:高维数据统计推断、大规模数据流分析、变点和异常点检测等,在统计学和机器学习相关领域的顶尖杂志Ann.Stat.BiometrikaJ.Am.Stat.Asso.Math. Program.J.Mach.Learn. Res.上发表论文二十余篇,入选爱思唯尔中国高被引学者。主持基金委优青、杰青、重点项目、重大项目课题和科技部重点研发计划课题等。任教育部科技委委员、全国应用统计专业硕士教学指导委员会委员、中国现场统计研究会副理事长等。学者榜,在数学领域里被引用最多的论文中排名前百分之一;并在2022,2023年被斯坦福大学列入全球前 2% 被引用最多的科学家名单。是美国统计学会(ASA)和国际数理统计研究所(IMSFellow