Abstract

Consider a group of individuals (subjects) participating in the same psychological tests with

numerous questions (items) at different times, where the choices of each item have an implicit ordering.

The observed responses can be recorded in multiple response matrices over time, named multi-layer

ordinal categorical data, where layers refer to time points. Assuming that each subject has a common

mixed membership shared across all layers, enabling it to be affiliated with multiple latent classes with

varying weights, the objective of the grade of membership (GoM) analysis is to estimate these mixed

memberships from the data.

When the test is conducted only once, the data becomes traditional

single-layer ordinal categorical data. The GoM model is a popular choice for describing single-layer

categorical data with a latent mixed membership structure. However, GoM cannot handle multi-layer

ordinal categorical data. In this work, we propose a new model, multi-layer GoM, which extends GoM

to multi-layer ordinal categorical data. To estimate the common mixed memberships, we propose a

new approach, GoM-DSoG, based on a debiased sum of Gram matrices. We establish GoM-DSoG’s

per-subject convergence rate under the multi-layer GoM model. Our theoretical results suggest that

fewer no-responses, more subjects, more items, and more layers are beneficial for GoM analysis. We

also propose an approach to select the number of latent classes. Extensive experimental studies verify

the theoretical findings and show GoM-DSoG’s superiority over its competitors, as well as the accuracy

of our method in determining the number of latent classes.

Information

Preprint No.SS-2024-0276
Manuscript IDSS-2024-0276
Complete AuthorsHuan Qing
Corresponding AuthorsHuan Qing
Emailsqinghuan@u.nus.edu

References

  1. Agresti, A. (2012). Categorical data analysis, Volume 792. John Wiley & Sons.
  2. Agterberg, J. and A. R. Zhang (2024). Estimating higher-order mixed memberships via the l2,∞tensor perturbation bound. Journal of the American Statistical Association, 1–11.
  3. Ara´ujo, M. C. U., T. C. B. Saldanha, R. K. H. Galvao, T. Yoneyama, H. C. Chame, and V. Visani (2001). The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems 57(2), 65–73.
  4. Cape, J., M. Tang, and C. E. Priebe (2019). The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics. Annals of Statistics 47(5), 2405–2439.
  5. Chen, L. and Y. Gu (2024). A spectral method for identifiable grade of membership analysis with binary responses. Psychometrika, 1–32.
  6. Chen, Y., X. Li, and S. Zhang (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika 84, 124–146.
  7. Erosheva, E. A., S. E. Fienberg, and C. Joutard (2007). Describing disability through individual-level mixture models for multivariate binary data. Annals of Applied Statistics 1(2), 346.
  8. Gillis, N. and S. A. Vavasis (2013). Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(4), 698–714.
  9. Gillis, N. and S. A. Vavasis (2015). Semidefinite programming based preconditioning for more robust near-separable nonnegative matrix factorization. SIAM Journal on Optimization 25(1), 677–698.
  10. Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2), 215–231.
  11. Gormley, I. C. and T. B. Murphy (2009). A grade of membership model for rank data. Bayesian Analysis 4(2), 265 – 295.
  12. Gu, Y., E. A. Erosheva, G. Xu, and D. B. Dunson (2023). Dimension-grouped mixed membership models for multivariate categorical data. Journal of Machine Learning Research 24(88), 1–49.
  13. Hagenaars, J. A. and A. L. McCutcheon (2002). Applied latent class analysis. Cambridge University Press.
  14. Jin, J., Z. T. Ke, and S. Luo (2024). Mixed membership estimation for social networks. Journal of Econometrics 239(2), 105369.
  15. Ke, Z. T. and M. Wang (2024). Using svd for topic modeling. Journal of the American Statistical Association 119(545), 434–449.
  16. Klopp, O., M. Panov, S. Sigalla, and A. B. Tsybakov (2023). Assigning topics to documents by successive projections. Annals of Statistics 51(5), 1989–2014. Lei, J., K. Chen,
  17. and B. Lynch (2020). Consistent community detection in multi-layer network data. Biometrika 107(1), 61–73.
  18. Lei, J. and K. Z. Lin (2023). Bias-adjusted spectral clustering in multi-layer stochastic block models. Journal of the
  19. Lin, K. Z. and J. Lei (2024). Dynamic clustering for heterophilic stochastic block models with time-varying node memberships. arXiv preprint arXiv:2403.05654.
  20. Mao, X., P. Sarkar, and D. Chakrabarti (2021). Estimating mixed memberships with sharp eigenvector deviations. Journal of the American Statistical Association 116(536), 1928–1940.
  21. Nepusz, T., A. Petr´oczi, L. N´egyessy, and F. Bazs´o (2008). Fuzzy communities and the concept of bridgeness in complex networks. Physical Review E 77(1), 016107.
  22. Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103(23), 8577–8582.
  23. Newman, M. E. and M. Girvan (2004). Finding and evaluating community structure in networks. Physical Review E 69(2), 026113.
  24. Nylund-Gibson, K. and A. Y. Choi (2018). Ten frequently asked questions about latent class analysis. Translational Issues in Psychological Science 4(4), 440.
  25. Paul, S. and Y. Chen (2020). Spectral and matrix factorization methods for consistent community detection in multi-layer networks. Annals of Statistics 48(1), 230 – 250.
  26. Paul, S. and Y. Chen (2021). Null models and community detection in multi-layer networks. Sankhya A, 1–55.
  27. Pensky, M. and T. Zhang (2019). Spectral clustering in the dynamic stochastic block model. Electronic Journal of Statistics 13(1), 678 – 709.
  28. Qing, H. (2024a). Finding mixed memberships in categorical data. Information Sciences, 120785.
  29. Qing, H. (2024b). Latent class analysis for multi-layer categorical data. arXiv preprint arXiv:2408.05535.
  30. Qing, H. (2025a).
  31. Community detection by spectral methods in multi-layer networks. Applied Soft Computing, 112769.
  32. Qing, H. (2025b). Discovering overlapping communities in multi-layer directed networks. Chaos, Solitons & Fractals 194, 116175.
  33. Qing, H. (2025c). Mixed membership estimation for categorical data with weighted responses. TEST, 1–48.
  34. Qing, H. and J. Wang (2023). Community detection for weighted bipartite networks. Knowledge-Based Systems 274, 110643.
  35. Qing, H. and J. Wang (2024). Bipartite mixed membership distribution-free model. a novel model for community detection in overlapping bipartite weighted networks. Expert Systems with Applications 235, 121088.
  36. Robitzsch, A. (2023). sirt: Supplementary Item Response Theory Models. R package version 3.13-228.
  37. Shang, Z., E. A. Erosheva, and G. Xu (2021). Partial-mastery cognitive diagnosis models. Annals of Applied Statistics 15(3), 1529–1555.
  38. Sloane, D. and S. P. Morgan (1996). An introduction to categorical data analysis. Annual Review of Sociology 22(1), 351–375.
  39. Su, W., X. Guo, X. Chang, and Y. Yang (2024). Spectral co-clustering in multi-layer directed networks. Computational Statistics & Data Analysis, 107987.
  40. Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics 12, 389–434.
  41. Woodbury, M. A., J. Clive, and A. Garson Jr (1978). Mathematical typology: a grade of membership technique for obtaining disease definition. Computers and Biomedical Research 11(3), 277–298.
  42. Xu, S., Y. Zhen, and J. Wang (2023). Covariate-assisted community detection in multi-layer networks. Journal of Business & Economic Statistics 41(3), 915–926.

Acknowledgments

H.Q. was supported by the Scientific Research Foundation of Chongqing University of

Technology (Grant No. 2024ZDR003), and the Science and Technology Research Program

of Chongqing Municipal Education Commission (Grant No. KJQN202401168).

Supplementary Materials

All technical details and the MATLAB codes of GoM-DSoG can be found in the Supplementary Material.


Supplementary materials are available for download.