Abstract
Identifying the number and precise locations of multiple change points
in long sequences is a critical issue in statistics and machine learning.
However, accurate change point detection can be compromised by the presence of
local trends in the sequence when using the conventional parametric piecewiseconstant model. In this paper, we introduce an adaptive Neyman test to assess
the presence of local trends. Subsequently, we develop a novel change point detection procedure based on a partially linear model that incorporates these local
trends. Furthermore, we extend the proposed testing and estimation methods to
multidimensional cases, facilitating the identification of common change points in
array-based data. Our methods are straightforward to implement, and we evaluate their numerical performance through simulations and the analysis of SNP
genotyping data.
Information
| Preprint No. | SS-2024-0355 |
|---|---|
| Manuscript ID | SS-2024-0355 |
| Complete Authors | Shengji Jia, Chunming Zhang, Yiming Tang |
| Corresponding Authors | Yiming Tang |
| Emails | jstangyiming@163.com |
References
- Bleakley, K. and Vert, J. P. (2011). The group fused Lasso for multiple change-point detection. arXiv preprint arXiv: 1106.4199.
- Diskin, S. J., Li, M., Hou, C., Yang, S., Glessner, J., Hakonarson, H., Bucan, M., Maris, J. M.
- and Wang, K. (2008). Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res, 36, e126.
- Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist., 32, 407–489.
- Erdman, C. and Emerson, J. (2008). A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics, 24, 2143–2148.
- Fan, J. (1997). Comments on “Wavelets in statistics: A review” by A. Antoniadis. Journal of the Italian Statistical Association, 6, 131–138.
- Fan, J. and Huang, L. (2001). Goodness-of-fit test for parametric regression models. Jour. Ameri. Statist. Assoc, 96, 640–652.
- Fearnhead, P. and Liu, Z. (2007). On-line inference for multiple changepoint problems. J. R. Statist. Soc. B, 69, 589–605.
- Fridlyand, J., Snijders, A. M., Pinkel, D., Albertson, D. G. and Jain, A. N. (2004). Hidden Markov models approach to the analysis of array CGH data. Journal of Multivariate Analysis, 90, 132–153.
- Gijbels, I. and Goderniaux, A.C. (2004). Bootstrap test for change-points in nonparametric regression. Nonparametric Statistics, 16, 591–611.
- Gr´egoire, G. and Hamrouni, Z. (2002). Change point estimation by local linear smoothing. Journal of Multivariate Analysis, 83, 56–83.
- Harchaoui, Z. and L´evy-Leduc, C. (2010). Multiple changepoint estimation with a total variation penalty. J. Amer. Statist. Assoc, 105, 1480–1493.
- Horv´ath, L. (1993). The maximum likelihood method for testing changes in the parameters of normal observations. Ann. Statist., 21, 671–680.
- Huang, T., Wu, B., Lizardi, P. and Zhao, H. (2005). Detection of DNA copy number alterations using penalized least squares regression. Bioinformatics, 21, 3811–3817.
- Huber, W., Toedling, J. and Steinmetz, L. M. (2006). Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics, 22, 1963–1970.
- Jia, S. and Shi, L. (2022). Efficient change-points detection for genomic sequences via cumulative segmented regression. Bioinformatics, 38, 311–317.
- Liu, B., Zhang, X. and Liu, Y. (2022). High dimensional change point inference: Recent developments and extensions. Journal of multivariate analysis, 188, 104833.
- Marioni, J. C., Thorne, N. P., Valsesia, A., Fitzgerald, T., Redon, R., Fiegler, H., Andrews, T.
- D., Stranger, B. E., Lynch, A. G., Dermitzakis, E. T. et al. (2007). Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol., 8, R228.
- Muggeo, V. M. R. and Adelfio, G. (2011). Efficient change point detection for genomic sequences of continuous measurements. Bioinformatics, 27, 161–166.
- M¨uller, H. G. and Song, K. S. (1997). Two-stage change-point estimators in smooth regression models. Statistics & Probability Letters, 34, 323–335.
- M¨uller, H. G. and Stadtm¨uller, U. (1999). Discontinuous versus smooth regression. Ann. Statist., 27, 299–337.
- Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variations. Ann. Appl. Stat., 6, 1306–1326.
- Niu, Y. S., Hao, N. and Zhang, H. (2016). Multiple change-point detection: a selective overview. Statistical Science, 31, 611–623.
- Olshen, A., Venkatraman, E., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5, 557–572.
- Rinaldo, A. (2009). Properties and refinements of the fused lasso. Ann. Statist., 37, 2922–2952.
- Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013). A sparse-group Lasso, Journal of Computational and Graphical Statistics, 22, 231-245.
- Song, C., Min, X. and Zhang, H. (2016). The screening and ranking algorithm for change-points detection in multiple samples. Ann. Appl. Stat., 10, 2102–2129.
- Tian, Z., Zhang, H. and Kuang, R. (2012). Sparse group selection on fused lasso components for identifying group-specific DNA copy number variations. IEEE International Conference on Data Mining, 12, 665–674.
- Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R. Statist. Soc. B, 58, 267–288.
- Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics, 9, 18–29.
- Vidakovic, B. (1999). Statistical modeling by wavelets. Wiley, New York.
- Wang, Y. (1995). Jump and sharp cusp detection by wavelets. Biometrika, 82, 385–397.
- Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Statist. Soc. B, 68, 49–67.
- Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist., 38, 894–942.
- Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika, 97, 631–645.
- Zhang, Y., Liu, W. and Duan, J. (2024). On the core segmentation algorithms of copy number variations detection tools. Briefings in Bioinformatics, 25(2), 1–10.
- Zhao, W., Zhu, X. and Zhu, L. (2023). Detecting multiple change points: The pulse criterion. Statistica Sinica, 33, 431–451.
Acknowledgments
The authors thank the Associate Editor and three reviewers for their careful review and helpful suggestions. Jia’s research was partially supported
by National Natural Science Foundation of China, Grant 12501374, and
Shanghai Natural Science Foundation, Grant 25ZR1402404. The research
of Zhang was supported by U.S. National Science Foundation grants DMS-
2013486 and DMS-1712418, and by the University of Wisconsin-Madison
Office of the Vice Chancellor for Research and Graduate Education with
Supplementary Materials
The online Supplementary Material includes the conditions and proofs of
the theoretical results, additional simulations, and an additional real data
analysis.