In this talk, we consider detection of signal regions associated with disease outcomes in Genome-Wide Association Studies (GWAS). Gene- or region-based methods have become increasingly popular in GWAS as a complementary approach to traditional individual variant analysis. However, these methods test for the association between an outcome and the genetic variants in a pre-specified region, e.g., a gene. In view of massive intergenic regions in GWAS and substantial interests in identifying signal regions for subsequent fine mapping, we propose a computationally efficient quadratic scan (Q-SCAN) statistic based method to detect the existence and the locations of signal regions by scanning the genome continuously. The proposed method accounts for the correlation (linkage disequilibrium) among genetic variants, and allows for signal regions to have both signal and neutral variants, and signal variants whose effects can be in different directions. We study the asymptotic properties of the proposed Q-SCAN statistics. We derive an asymptotic threshold that controls for the family-wise error rate, and show that under regularity conditions the proposed method consistently selects the true signal regions. We perform simulation studies to evaluate the finite sample performance of the proposed method. Our simulation results showed that the proposed procedure outperforms the existing methods, especially when signal regions have signal variants whose effects are in different directions, or are contaminated with neutral variants, or have correlated variants. We apply the proposed method to analyze a lung cancer genome-wide association study to identify the genetic regions that are associated with lung cancer risk.
李子林,博士,2011年本科毕业于清华大学数学科学系,2016年博士毕业于清华大学数学科学系,2016年至今在哈佛大学陈曾熙公共卫生学院生物统计系进行博士后访问研究,合作导师是林希虹教授。