Ying Xu, Dong Xu and Victor Olman (2002). \noindent A practical method for interpretation of threading scores: an application of neural network Vol.12, No.1.

Statistica Sinica 12(2002), 159-177

A PRACTICAL METHOD FOR INTERPRETATION

OF THREADING SCORES: AN APPLICATION OF

NEURAL NETWORK

Ying Xu, Dong Xu and Victor Olman

Oak Ridge National Laboratory

Abstract: Protein threading has become a popular technique for protein fold recogni and structure prediction. However it remains a challenging and unsolved problem to assess the significance or reliability of a threading predicti result. The lack of an effective mechanism for such an assessment has gre limited further applications of threading on a genome-scale. We have developed a practical method for assessing the reliability of a threading result, using a neural network approach. As a key goal of threading is to separate true sequence-fold pairs (a pair of proteins that share the same structural fold) from false sequence-fold pairs, we have examined the distribution of true pairs against the many times more false pairs in the parameter space, and discovered that the vast majority of the true pairs into a continuous region without any false ones, providing the basis for pattern recognition using a neural network. We have trained a neural netw trying to capture the shape of this ``true" region. Our preli minary results are quite encouraging and show that our approach is more effective in assessing the prediction reliability than another neural network-based approach employed in the popular threading program GenThreader. This preliminary study also indicates that our current neural network is too simple to accurately capture the overall shape of the true region, and po to directions for further investigation on this highly important and challenging problem. This neural network-based assessment capability has implemented in our threading program PROSPECT and used during the CASP4 predictions. Our successful performance in CASP4 suggests that even thoug this trained neural network is far from being perfect, it is fairly effective.

Key words and phrases: Confidence assessment, fold recognition, neural network, protein structure prediction, threading.