Motivated by the widely used geometric median-of-means estimator in machine learning, this paper studies statistical inference for ultrahigh dimensionality location parameter based on the sample spatial median under a general multivariate model, including simultaneous confidence intervals construction, global tests, and multiple testing with FDR control, as well as asymptotic relative efficiency of the sample spatial median relative to the sample mean.
To achieve these goals, we derive a novel Bahadur representation of the sample spatial median with a maximum-norm bound on the remainder term, and establish Gaussian approximation for the sample spatial median over the class of hyperrectangles. In addition, a multiplier bootstrap algorithm is proposed to approximate the distribution of the sample spatial median.
The approximations are valid when the dimension diverges at an exponentially rate of the sample size, which facilitates the application of the spatial median in the ultrahigh dimensional region. Moreover, we extend the Gaussian and bootstrap approximations to the two-sample problem, when the difference between two location parameter is of interest. The proposed approaches are further illustrated by simulations and analysis of a genomic dataset from a microarray study. This work joint with Liuhua Peng and Changliang Zou.
程光辉, 统计学博士, 2017年12月博士毕业于东北师大数学与统计学院, 现为广州大学副教授。主持国家青年基金一项,广东省省面上基金一项,在 AOS,Biometrika,Statistica Sinica, Biometrices, Scandinavian journal of statistics , CSDA等权威统计期刊发表多篇论文。