Back To Index Previous Article Next Article Full Text

Statistica Sinica 24 (2014), 1633-1654

VARIABLE SELECTION FOR HIGH DIMENSIONAL
MULTIVARIATE OUTCOMES
Tamar Sofer, Lee Dicker and Xihong Lin
University of Washington, Rutgers University and Harvard school of Public Health

Abstract: We consider variable selection for high-dimensional multivariate regression using penalized likelihoods when the number of outcomes and the number of covariates might be large. To account for within-subject correlation, we consider variable selection when a working precision matrix is used and when the precision matrix is jointly estimated using a two-stage procedure. We show that under suitable regularity conditions, penalized regression coefficient estimators are consistent for model selection for an arbitrary working precision matrix, and have the oracle properties and are efficient when the true precision matrix is used or when it is consistently estimated using sparse regression. We develop an efficient computation procedure for estimating regression coefficients using the coordinate descent algorithm in conjunction with sparse precision matrix estimation using the graphical LASSO (GLASSO) algorithm. We develop the Bayesian Information Criterion (BIC) for estimating the tuning parameter and show that BIC is consistent for model selection. We evaluate finite sample performance for the proposed method using simulation studies and illustrate its application using the type II diabetes gene expression pathway data.

Key words and phrases: BIC, consistency, correlation, efficiency, model selection, multiple outcomes, oracle estimator.

Back To Index Previous Article Next Article Full Text