On Robust Clustering of Event Stream Data

Yuecheng Zhang, Guanhua Fang and Wen Yu

doi:10.5705/ss.202024.0336

Abstract

Clustering of event stream data is of great importance in many application

scenarios, including but not limited to, e-commerce, electronic health, online testing,

mobile music service, etc. Existing clustering algorithms fail to take outlier data into

consideration and are implemented without theoretical guarantees. In this paper, we

propose a robust temporal point processes clustering framework that works under

mild assumptions and meanwhile addresses several important issues in the event

stream clustering problem.

Specifically, we introduce a computationally efficient

model-free distance function to quantify the dissimilarity between different event

streams so that the outliers can be detected and the good initial clusters could be

obtained. We further propose a classification algorithm incorporated with a Catoni’s

influence function for robust estimation and fine-tuning of clusters. We also establish

the theoretical results including algorithmic convergence, estimation error bound,

outlier detection, etc. Simulation results corroborate our theoretical findings and

real data applications show the effectiveness of our proposed methodology.

Key words and phrases: Catoni estimator, Event stream, Initialization, Outliers

Information

Preprint No.	SS-2024-0336
Manuscript ID	SS-2024-0336
Complete Authors	Yuecheng Zhang, Guanhua Fang, Wen Yu
Corresponding Authors	Guanhua Fang
Emails	fanggh@fudan.edu.cn

References

Arthur, D. and S. Vassilvitskii (2007). K-means++ the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 1027–1035.
Berndt, D. J. and J. Clifford (1994). Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd international conference on knowledge discovery and data mining, pp. 359–370.
Bhatt, S., G. Fang, P. Li, and G. Samorodnitsky (2022, 17–23 Jul). Minimax m-estimation under adversarial contamination. In Proceedings of the 39th International Conference on Machine Learning, Volume 162 of Proceedings of Machine Learning Research, pp. 1906–1924. PMLR.
Blei, D. M., A. Kucukelbir, and J. D. McAuliffe (2017). Variational inference: A review for statisticians. Journal of the American statistical Association 112(518), 859–877.
Bradley, P. S. and U. M. Fayyad (1998). Refining initial points for k-means clustering. In ICML, Volume 98, pp. 91–99. Citeseer.
Brown, E. N., R. Barbieri, V. Ventura, R. E. Kass, and L. M. Frank (2002). The time-rescaling theorem and its application to neural spike train data analysis. Neural computation 14(2), 325–346.
Bubeck, S., N. Cesa-Bianchi, and G. Lugosi (2013). Bandits with heavy tail. IEEE Transactions on Information Theory 59(11), 7711–7717.
Cai, B., J. Zhang, and Y. Guan (2024). Latent network structure learning from high-dimensional multivariate point processes. Journal of the American Statistical Association 119(545), 95–108.
Cao, J., X. Lin, X. Cong, S. Guo, H. Tang, T. Liu, and B. Wang (2021). Deep structural point process for learning temporal interaction networks. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021,
Proceedings, Part I 21, pp. 305–320. Springer.
Carneiro, M. J. T. et al. (2012). Towards the discovery of temporal patterns in music listening using last. fm profiles. Dissertation.
Casella, G. and R. L. Berger (2024). Statistical inference (Second ed.). Chapman and Hall/CRC.
Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviation study. In Annales de l’IHP Probabilit´es et statistiques, Volume 48, pp. 1148–1185.
Daley, D. J., D. Vere-Jones, et al. (2003). An introduction to the theory of point processes: volume I: elementary theory and methods. Springer.
De Boor, C. (1972). On calculating with b-splines. Journal of Approximation theory 6(1), 50–62.
Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society: series B (methodological) 39(1), 1–22.
Deshpande, A., P. Kacham, and R. Pratap (2020). Robust k-means++. In Conference on Uncertainty in Artificial Intelligence, pp. 799–808. PMLR.
Diebolt, J. and E. H. S. Ip (1996). Stochastic EM: Method and application. In Markov Chain Monte Carlo in Practice, pp. 259–273. London: Chapman and Hall. Cited in: Gad, A. M., & Ahmed, A. E.
(2006). Analysis of longitudinal data with intermittent missing values using the stochastic EM algorithm. Computational Statistics & Data Analysis, 50(9), 2252-2271.
Eiter, T. and H. Mannila (1994). Computing discrete Fr´echet distance. Technical Report CD-TR 94/64, Christian Doppler Laboratory for Expert Systems, TU Vienna, Vienna, Austria. Cited in: “Walking Your Frog Fast in 4 LoC”. arXiv:2404.05708 [cs.CG]. 2024.
Enguehard, J., D. Busbridge, A. Bozson, C. Woodcock, and N. Hammerla (2020). Neural temporal point processes for modelling electronic health records. In Machine Learning for Health, pp. 85–113. PMLR.
Fang, G., P. Li, and G. Samorodnitsky (2023). Empirical risk minimization for losses without variance. arXiv preprint arXiv:2309.03818.
Fang, G., G. Xu, H. Xu, X. Zhu, and Y. Guan (2024). Group network hawkes process. Journal of the American Statistical Association 119(547), 2328–2344.
Fraley, C. and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American statistical Association 97(458), 611–631.
Georgogiannis, A. (2016). Robust k-means: a theoretical revisit. In Advances in Neural Information Processing Systems, Volume 29, pp. 2891–2899.
Gupta, M., J. Gao, C. C. Aggarwal, and J. Han (2013). Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and data Engineering 26(9), 2250–2267.
Hosseini, S. A., K. Alizadeh, A. Khodadadi, A. Arabzadeh, M. Farajtabar, H. Zha, and H. R. Rabiee
(2017). Recurrent poisson factorization for temporal recommendation. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855.
Hsu, D. and S. Sabato (2016). Loss minimization and parameter estimation with heavy tails. The Journal of Machine Learning Research 17(1), 543–582.
Huber, P. J. (1992). Robust estimation of a location parameter. In Breakthroughs in statistics: Methodology and distribution, pp. 492–518. Springer.
Huber, P. J. (2004). Robust statistics, Volume 523. John Wiley & Sons.
Lehmann, E. L. and G. Casella (2006). Theory of point estimation. Springer Science & Business Media.
Lugosi, G. and S. Mendelson (2021). Robust multivariate mean estimation: the optimality of trimmed mean. The Annals of Statistics 49, 393–410.
Luo, D., H. Xu, H. Zha, J. Du, R. Xie, X. Yang, and W. Zhang (2014). You are what you watch and when you watch: Inferring household structures from iptv viewing data. IEEE Transactions on Broadcasting 60(1), 61–72.
Luo, D., H. Xu, Y. Zhen, X. Ning, H. Zha, X. Yang, and W. Zhang (2015). Multi-task multi-dimensional hawkes processes for modeling event sequences. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pp. 3685–3691. AAAI Press.
Manning, C. D., P. Raghavan, and H. Sch¨utze (2008). Introduction to Information Retrieval. Cambridge University Press.
McLachlan, G. J., S. X. Lee, and S. I. Rathnayake (2019). Finite mixture models. Annual review of statistics and its application 6, 355–378.
Pei, T., X. Gong, S.-L. Shaw, T. Ma, and C. Zhou (2013). Clustering of temporal event processes. International Journal of Geographical Information Science 27(3), 484–510.
Peng, J. and H.-G. M¨uller (2008). Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. The Annals of Applied Statistics 2(3), 1056–1077.
Pillow, J. (2009). Time-rescaling methods for the estimation and assessment of non-poisson neural encoding models. In Advances in Neural Information Processing Systems, Volume 22, pp. 1473–1481.
Prasad, A., S. Balakrishnan, and P. Ravikumar (2020). A robust univariate mean estimator is all you need. In International Conference on Artificial Intelligence and Statistics, pp. 4034–4044. PMLR.
Sani, M. F., S. J. van Zelst, and W. M. van der Aalst (2019). Repairing outlier behaviour in event logs using contextual behaviour. Enterprise Modelling and Information Systems Architectures 14, 5–1.
Wang, D., X. Zhang, Y. Wan, D. Yu, G. Xu, and S. Deng (2021). Modeling sequential listening behaviors with attentive temporal point process for next and next new music recommendation. IEEE Transactions on Multimedia 24, 4170–4182.
Xu, H., G. Fang, Y. Chen, J. Liu, and Z. Ying (2018). Latent class analysis of recurrent events in problemsolving items. Applied Psychological Measurement 42(6), 478–498.
Xu, H. and H. Zha (2017). A dirichlet mixture model of hawkes processes for event sequence clustering. In Advances in Neural Information Processing Systems, Volume 30, pp. 1354–1363.
Xu, L., J. A. Duan, and A. Whinston (2014). Path to purchase: A mutually exciting point process model for online advertising and conversion. Management Science 60(6), 1392–1412.
Yan, J. (2019). Recent advance in temporal point process: from machine learning perspective. Shanghai Jiao Tong University Technical Report.
Yin, L., G. Xu, H. Sang, and Y. Guan (2021). Row-clustering of a point process-valued matrix. Advances in Neural Information Processing Systems 34, 20028–20039.
Zhang, Y., J. Yan, X. Zhang, J. Zhou, and X. Yang (2022). Learning mixture of neural temporal point processes for multi-dimensional event sequence clustering. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, pp. 23–29.
`Oscar Celma (2010). lastfm music recommendation dataset. https://doi.org/10.5281/zenodo.6090214.

Acknowledgments

. The authors would like to thank the Associate Editor and the

two anonymous referees for their constructive suggestions and comments, which

helped improve the quality of the paper. Guanhua Fang is partly supported by the

National Natural Science Foundation of China (Grant No. 12301376) and Shanghai Educational Development Foundation (Grant No. 23CGA02). Wen Yu is par-

tially supported by the National Natural Science Foundation of China (Grant No.

12071088).

Supplementary Materials

The online material contains technical proofs, more

simulation results, explanations and discussions.

Supplementary materials are available for download.

[1] Arthur, D. and S. Vassilvitskii (2007). K-means++ the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 1027–1035.

[2] Berndt, D. J. and J. Clifford (1994). Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd international conference on knowledge discovery and data mining, pp. 359–370.

[3] Bhatt, S., G. Fang, P. Li, and G. Samorodnitsky (2022, 17–23 Jul). Minimax m-estimation under adversarial contamination. In Proceedings of the 39th International Conference on Machine Learning, Volume 162 of Proceedings of Machine Learning Research, pp. 1906–1924. PMLR.

[4] Blei, D. M., A. Kucukelbir, and J. D. McAuliffe (2017). Variational inference: A review for statisticians. Journal of the American statistical Association 112(518), 859–877.

[5] Bradley, P. S. and U. M. Fayyad (1998). Refining initial points for k-means clustering. In ICML, Volume 98, pp. 91–99. Citeseer.

[6] Brown, E. N., R. Barbieri, V. Ventura, R. E. Kass, and L. M. Frank (2002). The time-rescaling theorem and its application to neural spike train data analysis. Neural computation 14(2), 325–346.

[7] Bubeck, S., N. Cesa-Bianchi, and G. Lugosi (2013). Bandits with heavy tail. IEEE Transactions on Information Theory 59(11), 7711–7717.

[8] Cai, B., J. Zhang, and Y. Guan (2024). Latent network structure learning from high-dimensional multivariate point processes. Journal of the American Statistical Association 119(545), 95–108.

[9] Cao, J., X. Lin, X. Cong, S. Guo, H. Tang, T. Liu, and B. Wang (2021). Deep structural point process for learning temporal interaction networks. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021,

[10] Proceedings, Part I 21, pp. 305–320. Springer.

[11] Carneiro, M. J. T. et al. (2012). Towards the discovery of temporal patterns in music listening using last. fm profiles. Dissertation.

[12] Casella, G. and R. L. Berger (2024). Statistical inference (Second ed.). Chapman and Hall/CRC.

[13] Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviation study. In Annales de l’IHP Probabilit´es et statistiques, Volume 48, pp. 1148–1185.

[14] Daley, D. J., D. Vere-Jones, et al. (2003). An introduction to the theory of point processes: volume I: elementary theory and methods. Springer.

[15] De Boor, C. (1972). On calculating with b-splines. Journal of Approximation theory 6(1), 50–62.

[16] Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society: series B (methodological) 39(1), 1–22.

[17] Deshpande, A., P. Kacham, and R. Pratap (2020). Robust k-means++. In Conference on Uncertainty in Artificial Intelligence, pp. 799–808. PMLR.

[18] Diebolt, J. and E. H. S. Ip (1996). Stochastic EM: Method and application. In Markov Chain Monte Carlo in Practice, pp. 259–273. London: Chapman and Hall. Cited in: Gad, A. M., & Ahmed, A. E.

[19] (2006). Analysis of longitudinal data with intermittent missing values using the stochastic EM algorithm. Computational Statistics & Data Analysis, 50(9), 2252-2271.

[20] Eiter, T. and H. Mannila (1994). Computing discrete Fr´echet distance. Technical Report CD-TR 94/64, Christian Doppler Laboratory for Expert Systems, TU Vienna, Vienna, Austria. Cited in: “Walking Your Frog Fast in 4 LoC”. arXiv:2404.05708 [cs.CG]. 2024.

[21] Enguehard, J., D. Busbridge, A. Bozson, C. Woodcock, and N. Hammerla (2020). Neural temporal point processes for modelling electronic health records. In Machine Learning for Health, pp. 85–113. PMLR.

[22] Fang, G., P. Li, and G. Samorodnitsky (2023). Empirical risk minimization for losses without variance. arXiv preprint arXiv:2309.03818.

[23] Fang, G., G. Xu, H. Xu, X. Zhu, and Y. Guan (2024). Group network hawkes process. Journal of the American Statistical Association 119(547), 2328–2344.

[24] Fraley, C. and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American statistical Association 97(458), 611–631.

[25] Georgogiannis, A. (2016). Robust k-means: a theoretical revisit. In Advances in Neural Information Processing Systems, Volume 29, pp. 2891–2899.

[26] Gupta, M., J. Gao, C. C. Aggarwal, and J. Han (2013). Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and data Engineering 26(9), 2250–2267.

[27] Hosseini, S. A., K. Alizadeh, A. Khodadadi, A. Arabzadeh, M. Farajtabar, H. Zha, and H. R. Rabiee

[28] (2017). Recurrent poisson factorization for temporal recommendation. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855.

[29] Hsu, D. and S. Sabato (2016). Loss minimization and parameter estimation with heavy tails. The Journal of Machine Learning Research 17(1), 543–582.

[30] Huber, P. J. (1992). Robust estimation of a location parameter. In Breakthroughs in statistics: Methodology and distribution, pp. 492–518. Springer.

[31] Huber, P. J. (2004). Robust statistics, Volume 523. John Wiley & Sons.

[32] Lehmann, E. L. and G. Casella (2006). Theory of point estimation. Springer Science & Business Media.

[33] Lugosi, G. and S. Mendelson (2021). Robust multivariate mean estimation: the optimality of trimmed mean. The Annals of Statistics 49, 393–410.

[34] Luo, D., H. Xu, H. Zha, J. Du, R. Xie, X. Yang, and W. Zhang (2014). You are what you watch and when you watch: Inferring household structures from iptv viewing data. IEEE Transactions on Broadcasting 60(1), 61–72.

[35] Luo, D., H. Xu, Y. Zhen, X. Ning, H. Zha, X. Yang, and W. Zhang (2015). Multi-task multi-dimensional hawkes processes for modeling event sequences. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pp. 3685–3691. AAAI Press.

[36] Manning, C. D., P. Raghavan, and H. Sch¨utze (2008). Introduction to Information Retrieval. Cambridge University Press.

[37] McLachlan, G. J., S. X. Lee, and S. I. Rathnayake (2019). Finite mixture models. Annual review of statistics and its application 6, 355–378.

[38] Pei, T., X. Gong, S.-L. Shaw, T. Ma, and C. Zhou (2013). Clustering of temporal event processes. International Journal of Geographical Information Science 27(3), 484–510.

[39] Peng, J. and H.-G. M¨uller (2008). Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. The Annals of Applied Statistics 2(3), 1056–1077.

[40] Pillow, J. (2009). Time-rescaling methods for the estimation and assessment of non-poisson neural encoding models. In Advances in Neural Information Processing Systems, Volume 22, pp. 1473–1481.

[41] Prasad, A., S. Balakrishnan, and P. Ravikumar (2020). A robust univariate mean estimator is all you need. In International Conference on Artificial Intelligence and Statistics, pp. 4034–4044. PMLR.

[42] Sani, M. F., S. J. van Zelst, and W. M. van der Aalst (2019). Repairing outlier behaviour in event logs using contextual behaviour. Enterprise Modelling and Information Systems Architectures 14, 5–1.

[43] Wang, D., X. Zhang, Y. Wan, D. Yu, G. Xu, and S. Deng (2021). Modeling sequential listening behaviors with attentive temporal point process for next and next new music recommendation. IEEE Transactions on Multimedia 24, 4170–4182.

[44] Xu, H., G. Fang, Y. Chen, J. Liu, and Z. Ying (2018). Latent class analysis of recurrent events in problemsolving items. Applied Psychological Measurement 42(6), 478–498.

[45] Xu, H. and H. Zha (2017). A dirichlet mixture model of hawkes processes for event sequence clustering. In Advances in Neural Information Processing Systems, Volume 30, pp. 1354–1363.

[46] Xu, L., J. A. Duan, and A. Whinston (2014). Path to purchase: A mutually exciting point process model for online advertising and conversion. Management Science 60(6), 1392–1412.

[47] Yan, J. (2019). Recent advance in temporal point process: from machine learning perspective. Shanghai Jiao Tong University Technical Report.

[48] Yin, L., G. Xu, H. Sang, and Y. Guan (2021). Row-clustering of a point process-valued matrix. Advances in Neural Information Processing Systems 34, 20028–20039.

[49] Zhang, Y., J. Yan, X. Zhang, J. Zhou, and X. Yang (2022). Learning mixture of neural temporal point processes for multi-dimensional event sequence clustering. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, pp. 23–29.

[50] `Oscar Celma (2010). lastfm music recommendation dataset. https://doi.org/10.5281/zenodo.6090214.