We propose a new measure of dependence between a categorical random variable and a random vector with potentially high dimensions, named semi-distance correlation. It is an interesting extension of distance correlation proposed by Szekely, Rizzo and Bakirov (2007) to accommodate the information of the categorical random variable. It equals zero if and only if the categorical random variable and the other random vector are independent. Two important applications of semi-distance correlation are considered. First, we develop a semi-distance independence test between a categorical random variable and a random vector. Under the null hypothesis, we show that the test statistic converges in distribution to a weighted sum of the infinite number of independent chi-square random variables. Moreover, when the dimension of the random vector tends to infinity, we derive the explicit asymptotic normal distributions of the test statistic under both the null hypothesis and the alternative hypothesis, respectively, which allows us to compute p-values in an efficient and fast way for high dimensional data. Second, we propose to use the semi-distance correlation as a marginal utility between the response and a group of covariates to do groupwise variable screening for ultrahigh dimensional classification problems. The sure screening property has also been established. Monte Carlo simulations and a real data application are presented to demonstrate the excellent finite sample property of the proposed procedures. A new R package semidist is also developed to calculate the sample semi-distance correlation and implement the proposed independent test and variable screening procedures.
钟威,现任厦门大学王亚南经济研究院、经济学院统计学与数据科学系教授,系主任,博士生导师。2012年获得美国宾夕法尼亚州立大学统计学博士学位,2014年和2017年分别破格晋升副教授和教授,2018年入选厦门大学“南强青年拔尖人才”(A类),国家自然科学基金优秀青年基金获得者(2019),福建省杰出青年基金获得者(2019)。主要从事高维数据统计分析、统计学习算法、计量经济学等研究,在Annals of Statistics, Journal of the American Statistical Association, Biometrika, Journal of Econometrics, Journal of Business & Economic Statistics, Biometrics, Annals of Applied Statistics, Statistica Sinica,《中国科学数学》,《数学学报》等国内外统计学权威期刊发表(含接收)30多篇论文。2016年获得厦门大学第五届英语教学比赛一等奖,2020年获得厦门大学第十五届青年教师技能比赛特等奖,2021年获得厦门大学教学创新大赛一等奖,2021年获得福建省“向上向善好青年”称号,2022年获得教育部霍英东高等院校“青年科学奖”二等奖。