Home Page

The symposium features ten active researchers in statistics and data science to present and discuss their latest research. The participants will also discuss future of statistics in the data science era as well as its implications to the training of graduate students in statistics.

Organizing Committee:

Xuming He (University of Michigan)
Jianhua Guo (Northeast Normal University)
Wei Gao (Northeast Normal University)
Wensheng Zhu (Northeast Normal University)

Symposium Speakers:

Xingdong Feng (Shanghai University of Finance and Economics)
Feifang Hu (George Washington University)
Yuan Jiang (Oregon State University)
Bo Li (University of Illinois at Urbana-Champaign)
Yanyuan Ma (Penn State University)
Tiejun Tong (Hong Kong Baptist University)
Lan Wang (University of Minnesota)
Yingcun Xia (National University of Singapore)
Yong Zeng (University of Missouri at Kansas City)
Jingfei Zhang (University of Miami Business School)


Workshop Program:

Date: December 10, 2018

Place: School of Mathematics and Statistics, Room 104

09:30-11:30    Panel Discussion: future of statistics and data science (research and education)
11:30-13:30    Lunch
13:30-14:05    Yanyuan Ma: On Estimation of General Index Model for Survival Data
14:05-14:40    Lan Wang: Sparse concordance-assisted learning for optimal treatment decision
14:40-15:15    Yingcun Xia: Jackknife approach to the estimation of mutual information
15:15-15:30    Break
15:30-16:05    Bo Li: Detection of local differences between two spatiotemporal random fields
16:05-16:40    Tiejun Tong: Statistical meta-analysis and evidence-based medicine


Date: December 11, 2018

Place: School of Mathematics and Statistics, Room 104

08:45-09:20    Feifang Hu: How to Design Big Comparative Studies?
09:20-09:55    Xingdong Feng: Testing of covariate effects via ridge regression
09:55-10:30    Jingfei Zhang: Semi-parametric Learning of Structured Temporal Point Processes
10:30-10:50    Break
10:50-11:25    Yong Zeng: Bayesian Inference via Filtering Equations for Financial Ultra-High Frequency Data
11:25-12:00    Yuan Jiang: Microbial network estimation using bias-corrected graphical lasso
12:00       Lunch




Speaker: Yanyuan Ma

Affiliation: Penn State University

Title: On Estimation of General Index Model for Survival Data


We propose a general index model for survival data, which generalizes many commonly used semiparametric survival models and belongs to the framework of dimension reduction. Using a combination of geometric approach in semi-parametrics and martingale treatment in survival data analysis, we devise estimation procedures that are feasible and do not require covariate-independent censoring as assumed in many dimension reduction methods for censored survival data. We establish the root-n consistency and asymptotic normality of the proposed estimators and derive the most efficient estimator in this class for the general index model. Numerical experiments are carried out to demonstrate the empirical performance of the proposed estimators and an application to a real data further illustrates the usefulness of the work.


Speaker: Lan Wang

Affiliation: University of Minnesota

Title: Sparse concordance-assisted learning for optimal treatment decision


To find optimal decision rule, Fan et al. (2016) proposed an innovative concordance-assisted learning algorithm, which is based on maximum rank correlation estimator. It makes better use of the available information through pairwise comparison. However the objective function is discontinuous and computationally hard to optimize. In this paper, we consider a convex surrogate loss function to solve this problem. In addition, our algorithm ensures sparsity of decision rule and renders easy interpretation. We derive the L2 error bound of the estimated coefficients under ultra-high dimension. Simulation results of various settings and application to a clinical trial for depression treatment both illustrate that the proposed method can still estimate optimal treatment regime successfully when the number of covariates is large. (Joint work with Shuhan Liang, Wenbin Lu and Rui Song)


Speaker: Yingcun Xia

Affiliation: National University of Singapore

Title: Jackknife approach to the estimation of mutual information


Quantifying the dependence between two random variables is a fundamental issue in data analysis, and thus many measures have been proposed. Recent studies have focused on the renowned mutual information (MI) [Reshef DN, et al. (2011) Science 334:1518–1524]. However, “Unfortunately, reliably estimating mutual information from finite continuous data remains a significant and unresolved problem” [Kinney JB, Atwal GS (2014) Proc Natl Acad Sci USA 111:3354–3359]. In this paper, we examine the kernel estimation of MI and show that the bandwidths involved should be equalized. We consider a jackknife version of the kernel estimate with equalized bandwidth and allow the bandwidth to vary over an interval. We estimate the MI by the largest value among these kernel estimates and establish the associated theoretical underpinnings.


Speaker: Bo Li

Affiliation: University of Illinois at Urbana-Champaign

Title: Detection of local differences between two spatiotemporal random fields


Comparing the characteristics of spatiotemporal random fields is often challenging due to the high-dimensional characteristic and dependency in the data. Our goal is to compare the characteristics of the spatiotemporal random fields at each location and adjust the multiplicity due to multiple comparisons. Our method adopts the two-component mixture model for pointwise p-values. To integrate the dependency in the data, we model the mixing probability in the mixture model as a smooth function over the space as well as allow the alternative distribution to be spatially varying. Non-parametric and semi-parametric approaches combined with EM-algorithm are carried out to estimate the mixing probability function. A new adaptive multiple testing procedure is introduced to incorporate the spatial dependence to boost power.  We study its theoretical properties including finite sample False Discovery Rate (FDR) control, robustness against model misspecification and asymptotic power property through our simulation study. Our approach is further applied to the mean and covariance comparison between two synthetic climate fields.


Speaker: Tiejun Tong

Affiliation: Department of Mathematics, Hong Kong Baptist University

Title: Statistical meta-analysis and evidence-based medicine


Evidence-based medicine is attracting increasing attention to improve decision making in medical practice. Meta-analysis is a statistical technique widely used in evidence-based medicine for analytically combining the findings from independent clinical trials to provide an overall estimation of a treatment effectiveness. In this talk, I will first give an overview of the history and the main procedure of conducting a systematic review and meta-analysis. I will then present some statistical challenges in meta-analysis, and provide our new developments in this direction to advance the literature. Our proposed methods not only improve the existing ones significantly but also share the same virtue of the simplicity. They are also capable to serve as ‘‘rules of thumb’’ and will be widely applied in evidence-based medicine.


Speaker: Feifang Hu

Affiliation: Department of Statistics George Washington University

Title: How to Design Big Comparative Studies?


Covariate balance is one of the most important concerns for successful comparative studies, such as causal inference, online A/B testing and clinical trials, because it reduces bias and improves the accuracy of inference. However, chance imbalance may still exist in traditional randomized experiments, and are substantial increasing in big data. To address this issue, the proposed method allocates the units sequentially and adaptively, using information on the current level of imbalance and the incoming unit's covariate. With a large number of covariates or a large number of units, the proposed method shows substantial advantages over the traditional methods in terms of the covariate balance and computational time, making it an ideal technique in the era of big data. Furthermore, the proposed method improves the estimated average treatment effect accuracy by achieving a minimum variance asymptotically. Numerical studies and real data analysis provide further evidence of the advantages of the proposed method.


Speaker: Xingdong Feng

Affiliation: Shanghai University of Finance and Economics

Title: Testing of covariate effects via ridge regression


We revisit the ridge estimation in high-dimensional regression models. We obtain an upper bound of mean squared error for the ridge estimator of regression coefficients, and an efficient method is proposed on testing the covariate effects based on random matrix theorems.

The proposal is valid under both low- and high-dimensional models, which performs well not only for the sparse alternatives but also for the non-sparse ones. Numerical examples are used to assess the finite-sample performance of the proposed method.


Speaker: Jingfei Zhang

Affiliation: University of Miami Business School

Title: Semi-parametric Learning of Structured Temporal Point Processes


We propose a general framework of using multi-level log-Gaussian Cox processes to model repeatedly observed point processes with complex structures; such type of data have become increasingly available in various areas including medical research, social sciences, economics and finance due to technological advancement. A novel nonparametric approach is developed to efficiently and consistently estimate the covariance kernels of the latent Gaussian processes at all levels. To predict the functional principal component scores, we propose a consistent estimation procedure by maximizing the conditional likelihoods of super-positions of point processes. We further extend our procedure to the bivariate point process case in which potential correlations between the processes can be assessed. Asymptotic properties of the proposed estimators are investigated, and the effectiveness of our procedures is illustrated through a simulation study and an application to a stock trading dataset.


Speaker: Yong Zeng

Affiliation: University of Missouri at Kansas City

Title: Bayesian Inference via Filtering Equations for Financial Ultra-High Frequency Data


We propose a general partially-observed framework of Markov processes with marked point process observations for ultra-high frequency (UHF) transaction price data, allowing other observable economic or market factors. We develop the corresponding Bayesian inference via filtering equations to quantify parameter and model uncertainty.  Specifically, we derive filtering equations to characterize the evolution of the statistical foundation such as likelihoods, posteriors, Bayes factors and posterior model probabilities. Given the computational challenge, we provide a convergence theorem, enabling us to employ the Markov chain approximation method to construct consistent, easily-parallelizable, recursive algorithms.  The algorithms calculate the fundamental statistical characteristics and are capable of implementing the Bayesian inference in real-time for streaming UHF data, via parallel computing for sophisticated models. The general theory is illustrated by specific models built for U.S. Treasury Notes transactions data from GovPX and by Heston stochastic volatility model for stock transactions data. This talk consists joint works with B. Bundick, X. Hu, D. Kuipers and J. Yin.


Speaker: Yuan Jiang

Affiliation: Oregon State University

Title: Microbial network estimation using bias-corrected graphical lasso


With the increasing availability of microbiome 16S data, network estimation has become a useful approach to studying the interactions between Network estimation on a set of variables is frequently explored using graphical models, in which the relationship between two variables is modeled via their conditional dependency given the other variables. In recent years, various methods for sparse inverse covariance estimation have been proposed to estimate graphical models in the high-dimensional setting, including graphical lasso. However, current methods do not address the compositional count nature of microbiome data, where abundances of microbial taxa are not directly measured, but are reflected by the observed counts in an error-prone manner. Adding to the challenge is that the sum of the counts within each sample, termed “sequencing depth”, is an experimental technicality, which carries no scientific information but can vary drastically across samples. To address these issues, we develop a new approach to network estimation, which models the microbiome data using a multinomial log-normal distribution with the finite sequencing depth explicitly incorporated. We propose to improve the empirical covariance estimator via a computationally simple procedure that corrects the bias arising from the heterogeneity in sequencing depth. We then build our inverse covariance estimator on graphical lasso. We will show the advantage of our method in comparison to current approaches for inverse covariance estimation under a variety of simulation scenarios. We will also illustrate the use of our method in an application to a human microbiome data set.


For any information about the symposium, please contact Shuang Wu (E-mail: wus687@nenu.edu.cn).

About local organizer:

Northeast Normal University (abbreviated NENU) is one of the six national normal universities in the People's Republic of China, located in Changchun, Jilin province. The university was listed as a Project 211 university for its academic development. Recently it was awarded Chinese Ministry of Education’s Double First Class status in six disciplines, including statistics. School of mathematics and statistics at NENU has a highly ranked program in Statistics and houses a Key Laboratory of Applied Statistics of the Chinese Ministry of Education.

School of Mathematics and Statistics All Rights Reserved Tel:0431-85099589 Fax:0431-85098237