Probit Time-to-Event Regression for Misclassified Group Testing Data

Lijun Fang, Tao Hu, Shuwei Li, Lianming Wang, Christopher S. McMahan and Joshua M. Tebbs

doi:10.5705/ss.202024.0099

Abstract

Group testing has been used extensively to reduce screening costs in

epidemiological studies involving low-prevalence diseases. This testing strategy

involves combining specimens (e.g., blood, urine, swabs, etc.) from several individuals to form a pool and then testing the pooled specimen for infection.

When the endpoint of interest is a time-to-event outcome, for example, the time

until infection or disease, and pools are measured only once, the resulting data

are called group-tested current status data (Petito and Jewell, 2016). In this

paper, we propose a new type of regression analysis for these data using a semiparametric probit model, an alternative to the proportional hazards model in

survival analysis. A sieve maximum likelihood estimation approach is developed

that approximates the model’s nonparametric nuisance function by using logarithmic monotone splines, and an efficient expectation-maximization algorithm

is proposed. Asymptotic properties of the resulting estimators are investigated

by using empirical process techniques and sieve estimation theory. Numerical results from simulation studies suggest our estimation methods perform nominally,

even when pools are possibly misclassified due to assay error, and can outperform individual testing when the number of assays (tests) is fixed. We illustrate

our work by estimating a time-to-event regression model for chlamydial infection

using group testing data from a large public health laboratory in Iowa.

Key words and phrases: Current status data, EM algorithm, Maximum likelihood estimation, Pooled testing, Sieve estimation

Information

Preprint No.	SS-2024-0099
Manuscript ID	SS-2024-0099
Complete Authors	Lijun Fang, Tao Hu, Shuwei Li, Lianming Wang, Christopher S. McMahan, Joshua M. Tebbs
Corresponding Authors	Shuwei Li
Emails	seslishuw@gzhu.edu.cn

References

Abdalhamid, B., Bilder, C. R., McCutchen, E. L., Hinrichs, S. H., Koepsell, S. A. and Iwen,
P. C. (2020). Assessment of specimen pooling to conserve SARS CoV-2 testing resources. American Journal of Clinical Pathology 6, 715–718.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723.
Baruch, J., Suanes, A., Piaggio, J. M. and Gil, A. D. (2020). Analytic sensitivity of an ELISA test on pooled sera samples for detection of bovine brucellosis in eradication stages in Uraguay. Frontiers in Veterinary Science 7, 1–5.
Berger, T., Mandell, J. W. and Subrahmanya, P. (2000). Maximally efficient two-stage screening. Biometrics 56, 833–840.
Chatterjee, A. and Bandyopadhyay, T. (2020). Regression models for group testing: Identifiability and asymptotics. Journal of Statistical Planning and Inference 204, 141–152.
Chen, P., Tebbs, J.M. and Bilder, C.R. (2009). Group testing regression models with fixed and random effects. Biometrics 65, 1270–1278.
Chiou, S., Kang, S. and Yan, J. (2015). Rank-based estimating equations with general weight for accelerated failure time models: An induced smoothing approach. Statistics in Medicine 34, 1495–1510.
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society, Series B 34, 187–220.
Delaigle, A. and Hall, P. (2012). Nonparametric regression with homogeneous group testing data. Annals of Statistics 40, 131–158.
Delaigle, A. and Hall, P. (2015). Nonparametric methods for group testing data, taking dilution into account. Biometrika 102, 871–887.
Delaigle, A., Hall, P. and Wishart, J. (2014). New approaches to non- and semi-parametric regression for univariate and multivariate group testing data. Biometrika 101, 567–585.
Delaigle, A. and Meister, A. (2011). Nonparametric regression analysis for group testing data. Journal of the American Statistical Association 106, 640–650.
Dorfman, R. (1943). The detection of defective members of large populations. Annals of Mathematical Statistics 14, 436–440.
Du, M., Hu, T. and Sun, J. (2019). Semiparametric probit model for informative current status data. Statistics in Medicine 38, 2219–2227.
Fang, L., Li, S., Sun, L. and Song, X. (2023). Semiparametric probit regression model with misclassified current status data. Statistics in Medicine 42, 4440–4457.
Farrington, C. P. (1992). Estimating prevalence by group testing using generalized linear models. Statistics in Medicine 11, 1591–1597.
Gaydos, C. A., Quinn, T. C., Willis, D., Weissfeld, A., Hook, E., Martin, D. H., Ferrero, D.
and Schachter, J. (2003). Performance of the APTIMA Combo 2 Assay for detection of Chlamydia trachomatis and Neisseria gonorrhoeae in female urine and endocervical swab specimens. Journal of Clinical Microbiology 41, 304–309.
Heffernan, A. L., Aylward, L. L., Toms, L. M. L., Sly, P. D., Macleod, M. and Mueller, J. F.
(2014). Pooled biological specimens for human biomonitoring of environmental chemicals: Opportunities and limitations. Journal of Exposure Science and Environmental Epidemiology 24, 225–232.
Hou, P., Tebbs, J. M., McMahan, C. S. and Bilder, C. R. (2017). Hierarchical group testing for multiple infections. Biometrics 73, 656–665.
Huang, J. and Rossini, A. J. (1997). Sieve estimation for the proportional-odds failure-time regression model with interval censoring. Journal of the American Statistical Association 92, 960–967.
Huang, X. and Tebbs, J. M. (2009). On latent-variable model misspecification in structural measurement error models for binary response. Biometrics 65, 710–718.
Huang, Y. T. and Cai, T. (2016). Mediation analysis for survival data using semiparametric probit models. Biometrics 72, 563–574.
Hughes-Oliver, J. M. and Swallow, W. H. (1994). A two-stage adaptive group-testing procedure for estimating small proportions. Journal of the American Statistical Association 89, 982– 993.
Jin, Z., Lin, D. Y., Wei, L. J. and Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353.
Land, J. A., Van Bergen, J. E. A. M., Morre, S. A and Postma, M. J. (2010). Epidemiology of Chlamydia trachomatis infection in women and the cost-effectiveness of screening. Human Reproduction Update 16, 189–204.
LeFevre, M. L.(2014). Screening for chlamydia and gonorrhea: US Preventive Services Task Force recommendation statement. Annals of Internal Medicine 161, 902–910.
Lewis, J. L., Lockary, V. M. and Kobic, S. (2012). Cost savings and increased efficiency using a stratified specimen pooling strategy for Chlamydia trachomatis and Neisseria gonorrhoeae. Sexually Transmitted Diseases 39, 46–48.
Li, S., Hu, T., Wang, L., McMahan, C. S. and Tebbs, J. M. (2024). Regression analysis of group-tested current status data. Biometrika 111, 1047–1061.
Li, S., Hu, T., Wang, P. and Sun, J. (2017). Regression analysis of current status data in the presence of dependent censoring with applications to tumorigenicity experiments. Computational Statistics and Data Analysis 110, 75–86.
Lin, X. and Wang, L. (2010). A semiparametric probit model for case 2 interval-censored failure time data. Statistics in Medicine 29, 972–981.
Low, N. (2007). Screening programmes for chlamydial infection: When will we ever learn? BMJ 334, 725–728.
Lu, M., Zhang, Y. and Huang, J. (2007). Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika 94, 705–718.
McMahan, C. S., Tebbs, J. M. and Bilder, C. R. (2013). Regression models for group testing data with pool dilution effects. Biostatistics 14, 284–298.
McMahan, C. S., Tebbs, J. M., Hanson, T. E. and Bilder, C. R. (2017). Bayesian regression for group testing data. Biometrics 73, 1443–1452.
Mester, P., Witte, A. K., Robben, C., Streit, E., Fister, S., Schoder, D. and Rossmanith, P.
(2017). Optimization and evaluation of the qPCR-based pooling strategy DEP-pooling in dairy production for the detection of Listeria monocytogenes. Food Control 82, 298–304.
Neopane, P., Nypaver, J., Shrestha, R. and Beqaj, S. (2022). Performance evaluation of TaqMan SARS-CoV-2, Flu A/B, RSV RT-PCR multiplex assay for the detection of respiratory viruses. Infection and Drug Resistance 15, 5411–5423.
Petito, L. C. and Jewell, N. P. (2016). Misclassified group-tested current status data. Biometrika 103, 801–815.
Pilcher, C. D., Westreich, D. and Hudgens, M. G. (2020). Group testing for SARS-CoV-2 to enable rapid scale-up of testing and real-time surveillance of incidence. Journal of Infectious Diseases 222, 903–909.
Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science 3, 425–441.
Saá, P., Proctor, M., Foster, G., Krysztof, D., Winton, C., Linnen, J. M., Gao, K., Brodsky,
J. P., Limberger, R. J., Dodd, R. Y. and Stramer, S. L. (2018). Investigational testing for Zika virus among US blood donors. New England Journal of Medicine 378, 1778–1788.
Shen, X. and Wong, W. H (1994). Convergence rate of sieve estimates. Annals of Statistics 22, 580–615.
Shiboski, S. C. (1998). Generalized additive models for current status data. Lifetime Data Analysis 4, 29–50.
Stramer, S. L., Krysztof, D. E., Brodsky, J. P., Fickett, T. A., Reynolds, B., Dodd, R. Y. and
Kleinman, S. H. (2013). Comparative analysis of triplex nucleic acid test assays in United States blood donors. Transfusion 53, 2525–2537.
Vansteelandt, S., Goetghebeur, E. and Verstraeten, T. (2000). Regression models for disease prevalence with diagnostic tests on pools of serum samples. Biometrics 56, 1126–1133.
Wang, D., McMahan, C. S., Gallagher, C. M. and Kulasekera, K. B. (2014). Semiparametric group testing regression models. Biometrika 101, 587–598.
Westreich, D. J., Hudgens, M. G., Fiscus, S. A. and Pilcher, C. D. (2008). Optimizing screening for acute human immunodeficiency virus infection with pooled nucleic acid amplification tests. Journal of Clinical Microbiology 46, 1785–1792.
Wu, H. and Wang, L. (2019). Normal frailty probit model for clustered interval-censored failure time data. Biometrical Journal 61, 827–840.
Xie, M. (2001). Regression analysis of group testing samples. Statistics in Medicine 20, 1957– 1969.
Xie, M., Tatsuoka, K., Sacks, J. and Young, S. S. (2001). Group testing with blockers and synergism. Journal of the American Statistical Association 96, 92–102.
Zeng, D., Gao, F. and Lin, D. Y. (2017). Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika 104, 505–525.
Zeng, D. and Lin, D. Y. (2007). Efficient estimation for the accelerated failure time model. Journal of the American Statistical Association 102, 1387–1396.
Zeng, D., Mao, L. and Lin, D. Y. (2016). Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103, 253–271.
Zhang, Y., Hua, L. and Huang, J. (2010). A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scandanavian Journal of Statistics 37, 338–354.

Acknowledgments

We are grateful to two anonymous referees who provided helpful comments

on an earlier version of this article. This research is supported by Grants

12471251 and 12171328 from the National Nature Science Foundation of

China and and Grant Z210003 from the Beijing Natural Science Foundation.

Authors at institutions in the United States are supported by the National

Institutes of Health and the National Science Foundation.

Supplementary Materials

The online Supplementary Material contains conditional expectation derivations, detailed proofs for Theorems 1-3, a second simulation study, and an

analysis of the Iowa SHL data under the PH model. R code for data analysis

is available at https://github.com/lishuwstat/GTEMProbit.

Supplementary materials are available for download.

[1] Abdalhamid, B., Bilder, C. R., McCutchen, E. L., Hinrichs, S. H., Koepsell, S. A. and Iwen,

[2] P. C. (2020). Assessment of specimen pooling to conserve SARS CoV-2 testing resources. American Journal of Clinical Pathology 6, 715–718.

[3] Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723.

[4] Baruch, J., Suanes, A., Piaggio, J. M. and Gil, A. D. (2020). Analytic sensitivity of an ELISA test on pooled sera samples for detection of bovine brucellosis in eradication stages in Uraguay. Frontiers in Veterinary Science 7, 1–5.

[5] Berger, T., Mandell, J. W. and Subrahmanya, P. (2000). Maximally efficient two-stage screening. Biometrics 56, 833–840.

[6] Chatterjee, A. and Bandyopadhyay, T. (2020). Regression models for group testing: Identifiability and asymptotics. Journal of Statistical Planning and Inference 204, 141–152.

[7] Chen, P., Tebbs, J.M. and Bilder, C.R. (2009). Group testing regression models with fixed and random effects. Biometrics 65, 1270–1278.

[8] Chiou, S., Kang, S. and Yan, J. (2015). Rank-based estimating equations with general weight for accelerated failure time models: An induced smoothing approach. Statistics in Medicine 34, 1495–1510.

[9] Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society, Series B 34, 187–220.

[10] Delaigle, A. and Hall, P. (2012). Nonparametric regression with homogeneous group testing data. Annals of Statistics 40, 131–158.

[11] Delaigle, A. and Hall, P. (2015). Nonparametric methods for group testing data, taking dilution into account. Biometrika 102, 871–887.

[12] Delaigle, A., Hall, P. and Wishart, J. (2014). New approaches to non- and semi-parametric regression for univariate and multivariate group testing data. Biometrika 101, 567–585.

[13] Delaigle, A. and Meister, A. (2011). Nonparametric regression analysis for group testing data. Journal of the American Statistical Association 106, 640–650.

[14] Dorfman, R. (1943). The detection of defective members of large populations. Annals of Mathematical Statistics 14, 436–440.

[15] Du, M., Hu, T. and Sun, J. (2019). Semiparametric probit model for informative current status data. Statistics in Medicine 38, 2219–2227.

[16] Fang, L., Li, S., Sun, L. and Song, X. (2023). Semiparametric probit regression model with misclassified current status data. Statistics in Medicine 42, 4440–4457.

[17] Farrington, C. P. (1992). Estimating prevalence by group testing using generalized linear models. Statistics in Medicine 11, 1591–1597.

[18] Gaydos, C. A., Quinn, T. C., Willis, D., Weissfeld, A., Hook, E., Martin, D. H., Ferrero, D.

[19] and Schachter, J. (2003). Performance of the APTIMA Combo 2 Assay for detection of Chlamydia trachomatis and Neisseria gonorrhoeae in female urine and endocervical swab specimens. Journal of Clinical Microbiology 41, 304–309.

[20] Heffernan, A. L., Aylward, L. L., Toms, L. M. L., Sly, P. D., Macleod, M. and Mueller, J. F.

[21] (2014). Pooled biological specimens for human biomonitoring of environmental chemicals: Opportunities and limitations. Journal of Exposure Science and Environmental Epidemiology 24, 225–232.

[22] Hou, P., Tebbs, J. M., McMahan, C. S. and Bilder, C. R. (2017). Hierarchical group testing for multiple infections. Biometrics 73, 656–665.

[23] Huang, J. and Rossini, A. J. (1997). Sieve estimation for the proportional-odds failure-time regression model with interval censoring. Journal of the American Statistical Association 92, 960–967.

[24] Huang, X. and Tebbs, J. M. (2009). On latent-variable model misspecification in structural measurement error models for binary response. Biometrics 65, 710–718.

[25] Huang, Y. T. and Cai, T. (2016). Mediation analysis for survival data using semiparametric probit models. Biometrics 72, 563–574.

[26] Hughes-Oliver, J. M. and Swallow, W. H. (1994). A two-stage adaptive group-testing procedure for estimating small proportions. Journal of the American Statistical Association 89, 982– 993.

[27] Jin, Z., Lin, D. Y., Wei, L. J. and Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353.

[28] Land, J. A., Van Bergen, J. E. A. M., Morre, S. A and Postma, M. J. (2010). Epidemiology of Chlamydia trachomatis infection in women and the cost-effectiveness of screening. Human Reproduction Update 16, 189–204.

[29] LeFevre, M. L.(2014). Screening for chlamydia and gonorrhea: US Preventive Services Task Force recommendation statement. Annals of Internal Medicine 161, 902–910.

[30] Lewis, J. L., Lockary, V. M. and Kobic, S. (2012). Cost savings and increased efficiency using a stratified specimen pooling strategy for Chlamydia trachomatis and Neisseria gonorrhoeae. Sexually Transmitted Diseases 39, 46–48.

[31] Li, S., Hu, T., Wang, L., McMahan, C. S. and Tebbs, J. M. (2024). Regression analysis of group-tested current status data. Biometrika 111, 1047–1061.

[32] Li, S., Hu, T., Wang, P. and Sun, J. (2017). Regression analysis of current status data in the presence of dependent censoring with applications to tumorigenicity experiments. Computational Statistics and Data Analysis 110, 75–86.

[33] Lin, X. and Wang, L. (2010). A semiparametric probit model for case 2 interval-censored failure time data. Statistics in Medicine 29, 972–981.

[34] Low, N. (2007). Screening programmes for chlamydial infection: When will we ever learn? BMJ 334, 725–728.

[35] Lu, M., Zhang, Y. and Huang, J. (2007). Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika 94, 705–718.

[36] McMahan, C. S., Tebbs, J. M. and Bilder, C. R. (2013). Regression models for group testing data with pool dilution effects. Biostatistics 14, 284–298.

[37] McMahan, C. S., Tebbs, J. M., Hanson, T. E. and Bilder, C. R. (2017). Bayesian regression for group testing data. Biometrics 73, 1443–1452.

[38] Mester, P., Witte, A. K., Robben, C., Streit, E., Fister, S., Schoder, D. and Rossmanith, P.

[39] (2017). Optimization and evaluation of the qPCR-based pooling strategy DEP-pooling in dairy production for the detection of Listeria monocytogenes. Food Control 82, 298–304.

[40] Neopane, P., Nypaver, J., Shrestha, R. and Beqaj, S. (2022). Performance evaluation of TaqMan SARS-CoV-2, Flu A/B, RSV RT-PCR multiplex assay for the detection of respiratory viruses. Infection and Drug Resistance 15, 5411–5423.

[41] Petito, L. C. and Jewell, N. P. (2016). Misclassified group-tested current status data. Biometrika 103, 801–815.

[42] Pilcher, C. D., Westreich, D. and Hudgens, M. G. (2020). Group testing for SARS-CoV-2 to enable rapid scale-up of testing and real-time surveillance of incidence. Journal of Infectious Diseases 222, 903–909.

[43] Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science 3, 425–441.

[44] Saá, P., Proctor, M., Foster, G., Krysztof, D., Winton, C., Linnen, J. M., Gao, K., Brodsky,

[45] J. P., Limberger, R. J., Dodd, R. Y. and Stramer, S. L. (2018). Investigational testing for Zika virus among US blood donors. New England Journal of Medicine 378, 1778–1788.

[46] Shen, X. and Wong, W. H (1994). Convergence rate of sieve estimates. Annals of Statistics 22, 580–615.

[47] Shiboski, S. C. (1998). Generalized additive models for current status data. Lifetime Data Analysis 4, 29–50.

[48] Stramer, S. L., Krysztof, D. E., Brodsky, J. P., Fickett, T. A., Reynolds, B., Dodd, R. Y. and

[49] Kleinman, S. H. (2013). Comparative analysis of triplex nucleic acid test assays in United States blood donors. Transfusion 53, 2525–2537.

[50] Vansteelandt, S., Goetghebeur, E. and Verstraeten, T. (2000). Regression models for disease prevalence with diagnostic tests on pools of serum samples. Biometrics 56, 1126–1133.

[51] Wang, D., McMahan, C. S., Gallagher, C. M. and Kulasekera, K. B. (2014). Semiparametric group testing regression models. Biometrika 101, 587–598.

[52] Westreich, D. J., Hudgens, M. G., Fiscus, S. A. and Pilcher, C. D. (2008). Optimizing screening for acute human immunodeficiency virus infection with pooled nucleic acid amplification tests. Journal of Clinical Microbiology 46, 1785–1792.

[53] Wu, H. and Wang, L. (2019). Normal frailty probit model for clustered interval-censored failure time data. Biometrical Journal 61, 827–840.

[54] Xie, M. (2001). Regression analysis of group testing samples. Statistics in Medicine 20, 1957– 1969.

[55] Xie, M., Tatsuoka, K., Sacks, J. and Young, S. S. (2001). Group testing with blockers and synergism. Journal of the American Statistical Association 96, 92–102.

[56] Zeng, D., Gao, F. and Lin, D. Y. (2017). Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika 104, 505–525.

[57] Zeng, D. and Lin, D. Y. (2007). Efficient estimation for the accelerated failure time model. Journal of the American Statistical Association 102, 1387–1396.

[58] Zeng, D., Mao, L. and Lin, D. Y. (2016). Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103, 253–271.

[59] Zhang, Y., Hua, L. and Huang, J. (2010). A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scandanavian Journal of Statistics 37, 338–354.