报告人:周岭
报告地点:惟真楼523
报告时间:2025年7月7日星期一10:00-11:00
邀请人:郑术蓉
报告摘要:
Semi-supervised learning (SSL) has attracted growing interest due to its ability to incorporate information from a large set of $N$ unlabeled data to enhance the estimation and prediction performance of supervised learning (SL) based on a smaller set of $n$ labeled data. In this work, we derive the minimax learning rates for both estimation and prediction in SSL in the context of high-dimensional linear regression. Noting that the benefit of unlabeled data arises when the covariates contain information that is shared with the regression structure. To systematically examine the contribution of unlabeled data, we consider three distinct covariance structures: the eigenvalue decay model, the spiked covariance model, and factor analysis models. Our analysis yields several interesting findings: (a) When SL alone cannot achieve accurate estimations/predictions and $N$ is significantly larger than $n$, SSL can outperform SL by achieving higher-order improvements in both estimation and prediction accuracy; (b) Feature-extraction-based SSL generally outperforms imputation-based SSL, except in cases where the dimension $p$ is very large and $N$ is small relative to $n$. In terms of prediction accuracy, the feature-extraction approach consistently provides superior performance. These theoretical findings are further supported by numerical experiments.
主讲人简介:
周岭,西南财经大学教授,博导,曾荣获钟家庆数学奖,入选国家级青年人才。在非参数统计理论与应用、数据集成、亚组学习等三方面取得系列研究成果,发表在JASA, AoS,JRSSB等国际统计学,JMLR, ICML等计算机领域顶级期刊以及顶会上。现任Statistica Sinica 等期刊副主编,国际统计学会(ISI)的Elected member。主持国家自科面上、青年,作为主要参与人参与重点、科技部重大专项。