Abstract

We propose to estimate a parametric regression with truncated data

built on the mode value, where the dependent variable is subject to left truncation by another random variable. We construct a kernel mode-based objective

function with a constant bandwidth for estimation and suggest a modified mode

expectation-maximization algorithm to numerically estimate the model.

The

asymptotic normal distribution of the proposed estimator is derived under mild

conditions. To efficiently construct confidence intervals for the resulting estimator, we develop a mode-based empirical likelihood method, where the asymptotic

distribution of the empirical log-likelihood ratio is shown to follow a chi-square

distribution. Furthermore, by combining the kernel mode-based objective function with the SCAD penalty, a variable selection procedure for the parameters is

introduced and its oracle property is established. Monte Carlo simulations and

real data analysis related to housing market are presented to show the finite sample performance of the developed estimation and variable selection procedures.

Information

Preprint No.SS-2023-0288
Manuscript IDSS-2023-0288
Complete AuthorsTao Wang, Weixin Yao
Corresponding AuthorsTao Wang
Emailstaow@uvic.ca

References

  1. Amemiya, T. (1973). Regression Analysis When the Dependent Variable is Truncated Normal. Econometrica, 41, 997-1016.
  2. Chen, S. X. and Van Keilegom, I. (2009). A Review on Empirical Likelihood Methods for Regression. TEST, 18, 415-447.
  3. Fan, J. and Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360.
  4. Hausman, J. A. and Wise, D. A. (1977). Social Experimentation, Truncated Distributions, and Efficient Estimation. Econometrica, 45, 919-938.
  5. He, S. and Yang, G. L. (1998). Estimation of the Truncation Probability in the Random Truncation Model. The Annals of Statistics, 26, 1011-1027.
  6. He, S. and Yang, G. L. (2003). Estimation of Regression Parameters with Left Truncated Data. Journal of Statistical Planning and Inference, 117, 99-122.
  7. Kemp, G. C. R. and Santos Silva, J. M. C. (2012). Regression towards the Mode. Journal of Econometrics, 170, 92-101.
  8. Lee, M. J. (1989). Mode Regression. Journal of Econometrics, 42, 337-349.
  9. Lee, M. J. (1993). Quadratic Model Regression. Journal of Econometrics, 57, 1-19.
  10. Owen, A. B. (1988). Empirical Likelihood Ratio Confidence Intervals for A Single Functional. Biometrika, 75, 237-249.
  11. Owen, A. B. (1990). Empirical Likelihood Ratio Confidence Regions. The Annals of Statistics, 18, 90-120.
  12. Stute, W. (1993). Almost Sure Representations of the Product-Limit Estimator for Truncated Data. The Annals of Statistics, 21, 146-156.
  13. Su, Y.-R. and Wang, J.-L. (2012). Modeling Left-Truncated and RightCensored Survival Data with Longitudinal Covariates. The Annals of Statistics, 40, 1465-1488.
  14. Ullah, A., Wang, T., and Yao, W. (2021). Modal Regression for Fixed Effects Panel Data. Empirical Economics, 60, 261-308.
  15. Ullah, A., Wang, T., and Yao, W. (2022). Nonlinear Modal Regression for Dependent Data with Application for Predicting COVID-19. Journal of the Royal Statistical Society Series A, 185, 1424-1453.
  16. Ullah, A., Wang, T., and Yao, W. (2023). Semiparametric Partially Linear Varying Coefficient Modal Regression. Journal of Econometrics, 10011026.
  17. Wang, M. C. (1989). A Semiparametric Model for Randomly Truncated Data. Journal of the American Statistical Association, 84, 742-748.
  18. Wang, K. and Li, S. (2021). Robust Distributed Modal Regression for Massive Data. Computational Statistics & Data Analysis, 160, 107225.
  19. Wang, T. (2024). Nonlinear Kernel Mode-Based Regression for Dependent Data. Journal of Time Series Analysis, 45, 189-213.
  20. Woodroofe, M. (1985). Estimating a Distribution Function with Truncated Data. The Annals of Statistics, 13, 163-177.
  21. Yao, W. and Li, L. (2014). A New Regression Model: Modal Linear Regression. Scandinavian Journal of Statistics, 41, 656-671.
  22. Zhou, W. (2011). A Weighted Quantile Regression for Randomly Truncated Data. Computational Statistics and Data Analysis, 55, 554-566.
  23. Zhou, Y. and Yip, P. S. (1999). A Strong Representation of the ProductLimit Estimator for Left Truncated and Right Censored Data. Journal of Multivariate Analysis, 69, 261-280. a. Department of Economics and Department of Mathematics and Statistics (by courtesy), University of Victoria, Victoria, BC V8W 2Y2, Canada. E-

Acknowledgments

We are deeply grateful to the Co-Editor Yi-Hau Chen, Associate Editor,

and two anonymous referees for their constructive comments, leading to

the substantial improvement of the paper. We would also like to thank

Bo Honor´e, Aman Ullah, and the seminar participants at the UC Riverside, University of Washington, and University of Iowa for their helpful

comments. Tao Wang’s research is supported by SSHRC-IDG grant (430-

2023-00149) and UVic-SSHRC Explore grant (2023-2024), and Weixin Yao’s

research is supported by NSF grant (DMS-2210272).

Supplementary Materials

The supplementary file contains additional numerical and technical results.


Supplementary materials are available for download.