Data lying in a high-dimensional ambient space are commonly thought to have a much lower intrinsic dimension. In particular, the data may be concentrated near a lower- dimensional subspace or manifold. There is an immense literature focused on approximating the unknown subspace and the unknown density, and exploiting such approximations in clustering, data compression, and building of predictive models. Most of the literature relies on approximating subspaces and densities using a locally linear, and potentially multi-scale, dictionary with Gaussian kernels. In this talk, we propose a simple and general alternative, which instead uses pieces of spheres, or spherelets, to locally approximate the unknown subspace. I will also introduce a curved kernel called the the Fisher–Gaussian (FG) kernel which outperforms multivariate Gaussians in many cases. Theory is developed showing that spherelets can produce lower covering numbers and mean square errors for many manifolds, as well as the posterior consistency of the Dirichlet process mixture of FG kernels. Results relative to state-of-the-art competitors show gains in ability to accurately approximate the subspace and the density with fewer components and parameters. Time permitting, I will also present some applications of spherelets, including classification, geodesic distance estimation and clustering.
李帝东于杜克大学(Duke University)获得数学博士学位,现在普林斯顿大学计算机科学学院以及加利福尼亚大学洛杉矶分校生物统计学学院从事博士后研究。李帝东博士的工作主要集中于几何数据分析、信息几何以及流形学习等,研究成果多发表于Machine Learning, Journal of the Royal Statistical Society, Biometrika等期刊杂志并于2019年获得Inaugural IMS Lawrence D. Brown奖