Abstract
Given data vectors X1, . . . Xn ∈Rr, where Xi is a noisy observation of X∗
i , and
X∗
n are contained in an unknown simplex with K vertices, vertex hunting (VH) is the
problem of estimating the vertices of the true simplex. VH is a building block of several algorithms in hyperspectral remote sensing, soft clustering, topic modeling, and network mixed
membership estimation. The popular VH algorithms are susceptible to outliers, whose estimation errors are governed by maxi ∥Xi −X∗
i ∥. We propose a robust VH algorithm that properly
shrinks estimated vertices towards the interior of data cloud, so as to mitigate the effect of
outliers. The level of shrinkage is determined by maximizing a pseudo likelihood and has no
tuning parameter. We show that, when the barycentric coordinates of X∗
n come from a
Dirichlet distribution, the proposed method has a faster rate of convergence than several popular
VH algorithms.
Information
| Preprint No. | SS-2023-0159 |
|---|---|
| Manuscript ID | SS-2023-0159 |
| Complete Authors | Dieyi Chen, Tracy Ke, Shuyi Zhang |
| Corresponding Authors | Shuyi Zhang |
| Emails | syzhang@fem.ecnu.edu.cn |
References
- Airoldi, E. M., D. M. Blei, S. E. Fienberg, and E. P. Xing (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9, 1981–2014.
- Ara´ujo, M. C. U., T. C. B. Saldanha, R. K. H. Galvao, T. Yoneyama, H. C. Chame, and V. Visani (2001). The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems 57(2), 65–73.
- Arora, S., R. Ge, and A. Moitra (2012). Learning topic models – going beyond SVD. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pp. 1–10.
- Bioucas-Dias, J. M., A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot (2012). Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5(2), 354–379.
- Blei, D. M., A. Y. Ng, and M. I. Jordan (2003). Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022.
- Craig, M. D. (1994). Minimum-volume transforms for remotely sensed data. IEEE Transactions on Geoscience and Remote Sensing 32(3), 542–552.
- Cutler, A. and L. Breiman (1994). Archetypal analysis. Technometrics 36(4), 338–347.
- Deng, Q., D. Ramsk¨old, B. Reinius, and R. Sandberg (2014). Single-cell rna-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343(6167), 193–196.
- Dozat, T. (2016). Incorporating nesterov momentum into Adam. In Proceedings of the 4th International Conference on Learning Representations, pp. 1–4.
- Eugster, M. J. A. and F. Leisch (2009). From spider-man to hero — archetypal analysis in R. Journal of Statistical Software 30(8), 1–23.
- Javadi, H. and A. Montanari (2019). Nonnegative matrix factorization via archetypal analysis. Journal of the American Statistical Association, 1–22.
- Ji, P. and J. Jin (2016). Coauthorship and citation networks for statisticians (with discussion). The Annals of Applied Statistics 10, 1779–1812.
- Jin, J., Z. T. Ke, and S. Luo (2023). Mixed membership estimation for social networks. Journal of Econometrics.
- Ke, Z. T. and M. Wang (2022). Using svd for topic modeling. Journal of the American Statistical Association, 1–16. Van Dijk, D., R. Sharma, J. Nainys, K. Yim, P. Kathail, A. J. Carr, C. Burdziak, K. R. Moon, C. L. Chaffer,
- D. Pattabiraman, et al. (2018). Recovering gene interactions from single-cell data using data diffusion. Cell 174(3), 716–729.
- Winter, M. E. (1999). N-FINDR: An algorithm for fast autonomous spectral end-member determination in hyperspectral data. In SPIE’s International Symposium on Optical Science, Engineering, and Instrumentation, pp. 266–275. Harvard University
Acknowledgments
The authors are grateful to the anonymous referees, the associate editor and the editor
for their helpful comments and suggestions. Zhang’s research is partially supported
b N ti
l N t
l S i
F
d ti
f Chi
G
t 92358303 N ti
l K
R&D
Program of China Grants 2021YFA1000100, 2021YFA1000101 and 2021YFA1000104,
Science and Technology Commission of Shanghai Municipality Grant 23JS1400500,
and National Natural Science Foundation of China Grants 72331005, 72571102 and
- Ke’s research is partially supported by Sloan Research Grant FG-2023-19970.
Supplementary Materials
are available in the attached file which contains a summary of
major notations, useful lemmas, proofs of Proposition 1, Theorems 1 – 4 and Corollary
1.