﻿ Canonical correlation coefficients of high-dimensional Gaussian vectors: finite rank case-东北师范大学数学与统计学院
Canonical correlation coefficients of high-dimensional Gaussian vectors: finite rank case

Consider a Gaussian vector $\mathbf{z}=(\mathbf{x}',\mathbf{y}')'$, consisting of two sub-vectors $\mathbf{x}$ and $\mathbf{y}$ with dimensions $p$ and $q$ respectively. With $n$ independent observations of $\mathbf{z}$, we study the correlation between $\mathbf{x}$ and $\mathbf{y}$, from the perspective of the Canonical Correlation  Analysis. We investigate the  high-dimensional case: both $p$ and $q$ are proportional to the sample size $n$. Denote by $\Sigma_{uv}$ the population  cross-covariance matrix of  random vectors $\mathbf{u}$ and $\mathbf{v}$, and denote by $S_{uv}$ the sample counterpart. The canonical correlation coefficients between $\mathbf{x}$ and $\mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $\Sigma_{xx}^{-1}\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx}$. In this paper, we focus on the case that $\Sigma_{xy}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1\geq\cdots\geq r_k>0$. We study the sample counterparts of $r_i,i=1,\ldots,k$, i.e. the largest $k$ eigenvalues of the sample canonical correlation matrix $S _{xx}^{-1} S _{xy} S _{yy}^{-1} S _{yx}$, denoted by $\lambda_1\geq\cdots\geq \lambda_k$.  We show that there exists a threshold $r_c\in(0,1)$, such that for each $i\in\{1,\ldots,k\}$, when $r_i\leq r_c$, $\lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_i>r_c$, $\lambda_i$ possesses an almost sure limit in $(d_{+},1]$, from which we can recover $r_i$'s in turn,  thus provide an estimate of the latter in the high-dimensional scenario. We also obtain the limiting distribution of  $\lambda_i$'s under appropriate normalization. Specifically, $\lambda_i$ possesses Gaussian type fluctuation if $r_i>r_c$, and follows Tracy-Widom distribution if $r_i<r_c$.  Some applications of our results are also discussed.

(joint with Bao Zhigang, Hu Jiang, Pan Guangming)