We Pan, Guanghua Xiao and Xiaohong Huang (2006). Using input dependent weights for model combination and model selection with multiple sources of data. Vol.16, No.2

Statistica Sinica 16(2006), 523-540

USING INPUT DEPENDENT WEIGHTS FOR MODEL

COMBINATION AND MODEL SELECTION WITH

MULTIPLE SOURCES OF DATA

We Pan, Guanghua Xiao and Xiaohong Huang

University of Minnesota

Abstract: With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.

Key words and phrases: Classification, microarray data, model mixing, partial least squares, prediction.