Abstract
The estimation of nonparametric discontinuous regression function is
fundamental in many applied fields, but challenges arise when the number of
jumps (or discontinuities) is large and unknown. We propose a new jump detection method, via the consecutive screening and multiple testing (SaMT) al-
gorithm, for simultaneously estimating the unknown number of jump points and
detecting their locations in the flexible nonparametric regression model, guaranteeing the desired accuracy. The initial jump candidates are obtained in the
consecutive screening procedure combined with locally-linear smoothing method.
To further assess the significance of an individual jump candidate, we develop a
novel test based on profile likelihood inference. The ultimate selection of relevant
jump points is conducted in a multiple testing procedure, which eliminates irrelevant jump points with large variations, due to heteroscedastic errors, from jump
candidates. Moreover, we generalize the SaMT algorithm to detect common jump
points shared across multiple aligned sequences. The proposed method is easy
to implement, flexible in bandwidth and threshold selection, and outperforms
existing approaches in simulations and real-data applications.
Information
| Preprint No. | SS-2023-0216 |
|---|---|
| Manuscript ID | SS-2023-0216 |
| Complete Authors | Shengji Jia, Chunming Zhang |
| Corresponding Authors | Chunming Zhang |
| Emails | cmzhang@stat.wisc.edu |
References
- Antoch, J., Gr´egoire, G. and Huˇskov´a, M. (2007). Tests for continuity of regression function. Journal of Statistical Planning and Inference 137, 753–777.
- Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57, 289–300.
- Bleakley, K. and Vert, J. P. (2011). The group fused Lasso for multiple change-point detection. arXiv preprint arXiv:1106.4199. https://doi.org/10.48550/arXiv.1106.4199
- Cai, T. T., Jeng, X. J. and Jin, J. S. (2011). Optimal detection of heterogeneous and heteroscedastic mixtures. J. R. Stat. Soc. Ser. B 73, 629–662.
- Chen, H., Ren, H., Yao, F. and Zou, C. (2023). Data-driven selection of the number of changepoints via error rate control. Jour. Ameri. Statist. Assoc. 118, 1415–1428.
- Du, L. L. and Zhang, C. M. (2014). Single-index modulated multiple testing. Ann. Statist. 42, 1262–1311.
- Eichinger, B. and Kirch, C. (2018). A MOSUM procedure for the estimation of multiple random change-points. Bernoulli 24, 526–564.
- Eubank, R. L. and Speckman, P. (1994). Nonparametric estimation of functions with jump discontinuities. Lecture Notes-Monograph Series 23, 130–144.
- Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and
- Hall, New York.
- Fan, J. and Huang, L. (2001). Goodness-of-fit test for parametric regression models. Jour. Ameri. Statist. Assoc. 96, 640–652.
- Fan, J. and Huang, T. (2005). Profile likelihood inferences on semiparametric varying coefficient partially linear models. Bernoulli 11, 1031–1059.
- Fan, J. and Wang, Y. (2007). Multi-scale jump and volatility analysis for high-frequency financial data. Jour. Ameri. Statist. Assoc. 102, 1349–1362.
- Gijbels, I. and Goderniaux, A. C. (2004). Bootstrap test for change-points in nonparametric regression. Nonparametric Statistics 16, 591–611.
- Gijbels, I., Hall, P. and Kneip, A. (2004). Interval and band estimation for curves with jumps. Journal of Applied Probability 41, 65–79.
- Gr´egoire, G. and Hamrouni, Z. (2002). Change point estimation by local linear smoothing. J. Multivariate Anal. 83, 56–83.
- Hao, N., Niu, Y. S. and Zhang, H. (2013). Multiple change-point detection via a screening and ranking algorithm. Statist. Sinica 23, 1553–1572.
- Harchaoui, Z. and L´evy-Leduc, C. (2010). Multiple changepoint estimation with a total variation penalty. J. Amer. Statist. Assoc. 105, 1480–1493.
- Jewell, S., Fearnhead, P. and Witten, D. (2022). Testing for a change in mean after changepoint detection. J. R. Stat. Soc. Ser. B 84, 1082–1104.
- Jia, S., Zhang, C. and Wu, H. (2019). Efficient semiparametric regression for longitudinal data with regularised estimation of error covariance function. Journal of Nonparametric Statistics 31, 867–886.
- Joo, J. H. and Qiu, P. (2009). Jump detection in a regression curve and its derivative. Technometrics 51, 289–305.
- Korkas, K. K. and Fryzlewicz, P. (2017). Multiple change-point detection for non-stationary time series using wild binary segmentation. Statist. Sinica 27, 287–311.
- Li, H., Munk, A., and Sieling, H. (2016). FDR-control in multiscale change-point segmentation. Electron. J. Stat. 10, 918–959.
- Li, Q. and Racine, J. S. (2007). Nonparametric Econometrics: Theory and Practice. Princeton University Press, New Jersey.
- Li, Y. (2011). Efficient semiparametric regression for longitudinal data with nonparametric covariance estimation. Biometrika 98, 355–370.
- Ma, S. and Yang, L. (2011). A jump-detecting procedure based on polynomial spline estimation. Journal of Nonparametric Statistics 23, 67–81.
- Marioni, J. C., Thorne, N. P., Valsesia, A. et al. (2007). Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome biology 8(10), R228.
- Muggeo, V. M. R. and Adelfio, G. (2011). Efficient change point detection for genomic sequences of continuous measurements. Bioinformatics 27, 161–166.
- M¨uller, H. G. and Stadtm¨uller, U. (1999). Discontinuous versus smooth regression. Ann. Statist. 27, 299–337.
- Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variations. Ann. Appl. Stat. 6, 1306–1326.
- Olshen, A., Venkatraman, E., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572.
- Qiu, P. (2005). Image Processing and Jump Regression Analysis. John Wiley and Sons, New Jersey.
- Qiu, P. and Yandell, B. (1998). A local polynomial jump detection algorithm in nonparametric regression. Technometrics 40, 141–152.
- Song, C., Min, X. and Zhang, H. (2016). The screening and ranking algorithm for change-points detection in multiple samples. Ann. Appl. Stat. 10, 2102–2129.
- Stransky, N., Vallot, C., Reyal, F. et al. (2006). Regional copy number-independent deregulation of transcription in cancer. Nat. Genet. 38, 1386–1396.
- Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9, 18–29.
- Wang, G., Zou, C. and Qiu, P. (2022). Data-driven determination of the number of jumps in regression curves. Technometrics 64, 312–322.
- Wang, Y. (1995). Jump and sharp cusp detection by wavelets. Biometrika 82, 385–397.
- Xia, Z. M. and Qiu, P. H. (2015). Jump information criterion for statistical inference in estimating discontinuous curves. Biometrika 102, 397–408.
- Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika 97, 631–645.
Acknowledgments
We thank the editor, associate editor, and two reviewers for their constructive comments. Jia was supported by the Shanghai Natural Science Foun-
dation, grants 25ZR1402404. Zhang was supported by U.S. National Science Foundation grants DMS-2013486 and DMS-1712418, as well as funding
from the University of Wisconsin-Madison Office of the Vice Chancellor for
Research and Graduate Education provided by the Wisconsin Alumni Research Foundation.
Supplementary Materials
The online Supplementary Material contains all the technical conditions,
complete proofs of the main theoretical results, and an additional simulation
study.