Conditional Density Estimation with Deep Neural Networks

Chenxuan He, Yuan Gao, Liping Zhu and Jian Huang

doi:10.5705/ss.202025.0144

Abstract

Estimating conditional density functions is a fundamental problem in statistics. This task

is crucial for understanding the underlying relationships between variables and for making informed

predictions in various applications. In this paper, we introduce a novel deep nonparametric approach

for estimating conditional density functions from data. Our method leverages the flexibility and expressiveness of deep neural networks to model the conditional density without imposing restrictive

parametric assumptions. We formulate the problem of conditional density estimation as a nonparametric least squares problem, which allows us to harness the strengths of deep learning in a principled

manner.

By framing the problem this way, we can effectively utilize deep neural networks to approximate the conditional density function.

We demonstrate that our proposed approach achieves

the minimax optimal convergence rate for conditional density estimation. Additionally, we show that

the convergence rate can be further improved for high-dimensional data satisfying a low-dimensional

manifold assumption. To validate the performance of our approach, we conduct extensive numerical

evaluations on both simulated and real-world datasets. These experiments reveal that our method consistently outperforms several established techniques, highlighting its superior accuracy and robustness

in diverse scenarios.

Key words and phrases: Conditional density estimation, Deep neural networks, Optimal convergence 1

Information

Preprint No.	SS-2025-0144
Manuscript ID	SS-2025-0144
Complete Authors	Chenxuan He, Yuan Gao, Liping Zhu, Jian Huang
Corresponding Authors	Liping Zhu
Emails	zhu.liping@ruc.edu.cn

References

Albergo, M. S. and E. Vanden-Eijnden (2022). Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations.
Ambrogioni, L., U. G¨uccl¨u, M. A. J. van Gerven, and E. Maris (2017). The kernel mixture network: A nonparametric method for conditional density estimation of continuous random variables. arXiv:1705.07111.
Bartlett, P. L., N. Harvey, C. Liaw, and A. Mehrabian (2019). Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. Journal of Machine Learning Research 20(63), 1–17.
Bishop, C. M. (1994). Mixture density networks. Technical report, Aston University.
Bos, T. and J. Schmidt-Hieber (2024). A supervised deep learning method for nonparametric density estimation. Electronic Journal of Statistics 18(2), 5601–5658.
Cai, C. and G. Li (2025). Minimax optimality of the probability flow ODE for diffusion models. arXiv:2503.09583.
Chen, R. T. Q., Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018). Neural ordinary differential equations. In Advances in Neural Information Processing Systems, Volume 31. Curran Associates, Inc.
Cloud, K. A., B. J. Reich, C. M. Rozoff, S. Alessandrini, W. E. Lewis, and L. D. Monache (2019). A feed forward neural network based on model output statistics for short-term hurricane intensity prediction. Weather and Forecasting 34(4), 985–997.
Dai, D., J. Fan, Y. Gu, and D. Mukherjee (2025). CINDES: Classification induced neural density estimator and simulator. arXiv:2510.00367.
De Gooijer, J. G. and D. Zerom (2003). On conditional density estimation. Statistica Neerlandica 57(2), 159–176.
Dinh, L., J. Sohl-Dickstein, and S. Bengio (2017). Density estimation using Real NVP. In International Conference on Learning Representations.
Efromovich, S. (2007). Conditional density estimation in a regression setting. The Annals of Statistics 35(6), 2504– 2535.
Fan, J., Q. Yao, and H. Tong (1996). Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83(1), 189–206.
Farrell, M. H., T. Liang, and S. Misra (2021). Deep neural networks for estimation and inference. Econometrica 89(1), 181–213.
Fukumizu, K., T. Suzuki, N. Isobe, K. Oko, and M. Koyama (2025). Flow matching achieves almost minimax optimal convergence. In The Thirteenth International Conference on Learning Representations.
Gao, Y., J. Huang, Y. Jiao, and S. Zheng (2024). Convergence of continuous normalizing flows for learning probability distributions. arXiv:2404.00551.
Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems, Volume 27. Curran Associates, Inc.
Gy¨orfi, L., M. Kohler, A. Krzy˙zak, and H. Walk (2002). A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. New York, NY: Springer.
Han, D., S. Zheng, G. Shen, X. Song, L. Sun, and J. Huang (2025). Deep mutual density ratio estimation with bregman divergence and its applications. Journal of the American Statistical Association 120(551), 1990–2001.
Hinder, F., V. Vaquet, J. Brinkrolf, and B. Hammer (2021). Fast non-parametric conditional density estimation using moment trees. In 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7.
Ho, J., A. Jain, and P. Abbeel (2020). Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, Volume 33, pp. 6840–6851. Curran Associates, Inc.
Huang, J., Y. Jiao, Z. Li, S. Liu, Y. Wang, and Y. Yang (2022). An error analysis of generative adversarial networks for learning distributions. Journal of Machine Learning Research 23(116), 1–43.
Huberman, D. B., B. J. Reich, and H. D. Bondell (2022). Nonparametric conditional density estimation in a deep learning framework for short-term forecasting. Environmental and Ecological Statistics 29(4), 677–704.
Hyndman, R. J., D. M. Bashtannyk, and G. K. Grunwald (1996). Estimating and visualizing conditional densities. Journal of Computational and Graphical Statistics 5(4), 315–336.
Hyndman, R. J. and Q. Yao (2002). Nonparametric estimation and symmetry tests for conditional density functions. J l f N t i St ti ti 14(3) 259 278
Izbicki, R. and A. B. Lee (2016). Nonparametric conditional density estimation in a high-dimensional regression setting. Journal of Computational and Graphical Statistics 25(4), 1297–1316.
Izbicki, R. and A. B. Lee (2017). Converting high-dimensional regression to high-dimensional conditional density estimation. Electronic Journal of Statistics 11(2), 2800–2831.
Izbicki, R., A. B. Lee, and P. E. Freeman (2017). Photo-z estimation: An example of nonparametric conditional density estimation under selection bias. The Annals of Applied Statistics 11(2), 698–724.
Jiao, Y., G. Shen, Y. Lin, and J. Huang (2023). Deep nonparametric regression on approximate manifolds: Nonasymptotic error bounds with polynomial prefactors. The Annals of Statistics 51(2), 691–716.
Kingma, D. P. and M. Welling (2014). Auto-encoding variational Bayes. In International Conference on Learning Representations (ICLR).
Li, M., M. Neykov, and S. Balakrishnan (2022). Minimax optimal conditional density estimation under total variation smoothness. Electronic Journal of Statistics 16(2), 3937–3972.
Li, Q. and J. S. Racine (2006). Nonparametric Econometrics: Theory and Practice. Princeton University Press.
Lindgren, G. (2012). Stationary Stochastic Processes: Theory and Applications. CRC Press.
Lipman, Y., R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le (2023). Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations.
Liu, X., C. Gong, and Q. Liu (2023). Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations.
Mirza, M. and S. Osindero (2014). Conditional generative adversarial nets. arXiv:1411.1784.
Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics 33(3), 1065–1076.
Rezende, D. J. and S. Mohamed (2015). Variational inference with normalizing flows. In Proceedings of the 32 Nd International Conference on Machine Learning, Lille, France.
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics 27(3), 832–837.
Rosendal, H. E. and S. L. Shaw (1982). Relationship of maximum sustained winds to minimum sea level pressure in central North Pacific tropical cyclones. Technicacl memorandum, United States, National Weather Service., Pacific Region.
Rothfuss, J., F. Ferreira, S. Walther, and M. Ulrich (2019). Conditional density estimation with neural networks: Best practices and benchmarks. arXiv:1903.00954.
Ruzgas, T., M. Lukauskas, and G. vCepkauskas (2021). Nonparametric multivariate density estimation: Case study of Cauchy mixture model. Mathematics 9(21), 2717.
Samani, F. S., R. Stadler, C. Flinta, and A. Johnsson (2021). Conditional density estimation of service metrics for networked services. IEEE Transactions on Network and Service Management 18(2), 2350–2364.
Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics 48(4), 1875–1897.
Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization (2nd edition ed.). Hoboken, New Jersey: Wiley.
Sheldon, E. S., C. E. Cunha, R. Mandelbaum, J. Brinkmann, and B. A. Weaver (2012). Photometric redshift probability distributions for galaxies in the SDSS DR8. The Astrophysical Journal Supplement Series 201(2), 32.
Sohn, K., H. Lee, and X. Yan (2015). Learning Structured Output Representation using Deep Conditional Generative Models. In Advances in Neural Information Processing Systems, Volume 28. Curran Associates, Inc.
Song, Y. and S. Ermon (2019). Generative modeling by estimating gradients of the data distribution. In Advances i N l I f ti P i S t V l 32 C A i t I
Song, Y., J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole (2021). Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations.
St´ephanovitch, A., E. Aamari, and C. Levrard (2024). Wasserstein generative adversarial networks are minimax optimal distribution estimators. The Annals of Statistics 52(5), 2167–2193.
Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. The Annals of Statistics 10(4), 1040–1053.
Vardi, G., G. Yehudai, and O. Shamir (2022). Width is less important than depth in ReLU neural networks. In Proceedings of Thirty Fifth Conference on Learning Theory, pp. 1249–1281. PMLR.
Yan, X., Y. Su, and W. Ma (2023). Ensemble multi-quantiles: Adaptively flexible distribution prediction for uncertainty quantification. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(11), 13068–13082.
Zhou, X., Y. Jiao, J. Liu, and J. Huang (2023). A deep generative approach to conditional sampling. Journal of the American Statistical Association 118(543), 1837–1848. Chenxuan He

Acknowledgments

We are grateful to the Editor, Associate Editor, and three anonymous reviewers for their

valuable comments and suggestions, which significantly improved the quality of this article.

Liping Zhu’s work was supported by the National Key R&D Program of China

(2023YFA1008702), the National Natural Science Foundation of China (12225113), and the

Public Computing Cloud, Renmin University of China. Jian Huang’s work was supported

by the National Natural Science Foundation of China (72331005) and the research grants

from The Hong Kong Polytechnic University (P0046811, P0042888, P0045417, P0045931).

Supplementary Materials

The Supplementary Material contains additional simulation results and provides proofs for

each result stated in the paper.

Supplementary materials are available for download.

[1] Albergo, M. S. and E. Vanden-Eijnden (2022). Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations.

[2] Ambrogioni, L., U. G¨uccl¨u, M. A. J. van Gerven, and E. Maris (2017). The kernel mixture network: A nonparametric method for conditional density estimation of continuous random variables. arXiv:1705.07111.

[3] Bartlett, P. L., N. Harvey, C. Liaw, and A. Mehrabian (2019). Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. Journal of Machine Learning Research 20(63), 1–17.

[4] Bishop, C. M. (1994). Mixture density networks. Technical report, Aston University.

[5] Bos, T. and J. Schmidt-Hieber (2024). A supervised deep learning method for nonparametric density estimation. Electronic Journal of Statistics 18(2), 5601–5658.

[6] Cai, C. and G. Li (2025). Minimax optimality of the probability flow ODE for diffusion models. arXiv:2503.09583.

[7] Chen, R. T. Q., Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018). Neural ordinary differential equations. In Advances in Neural Information Processing Systems, Volume 31. Curran Associates, Inc.

[8] Cloud, K. A., B. J. Reich, C. M. Rozoff, S. Alessandrini, W. E. Lewis, and L. D. Monache (2019). A feed forward neural network based on model output statistics for short-term hurricane intensity prediction. Weather and Forecasting 34(4), 985–997.

[9] Dai, D., J. Fan, Y. Gu, and D. Mukherjee (2025). CINDES: Classification induced neural density estimator and simulator. arXiv:2510.00367.

[10] De Gooijer, J. G. and D. Zerom (2003). On conditional density estimation. Statistica Neerlandica 57(2), 159–176.

[11] Dinh, L., J. Sohl-Dickstein, and S. Bengio (2017). Density estimation using Real NVP. In International Conference on Learning Representations.

[12] Efromovich, S. (2007). Conditional density estimation in a regression setting. The Annals of Statistics 35(6), 2504– 2535.

[13] Fan, J., Q. Yao, and H. Tong (1996). Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83(1), 189–206.

[14] Farrell, M. H., T. Liang, and S. Misra (2021). Deep neural networks for estimation and inference. Econometrica 89(1), 181–213.

[15] Fukumizu, K., T. Suzuki, N. Isobe, K. Oko, and M. Koyama (2025). Flow matching achieves almost minimax optimal convergence. In The Thirteenth International Conference on Learning Representations.

[16] Gao, Y., J. Huang, Y. Jiao, and S. Zheng (2024). Convergence of continuous normalizing flows for learning probability distributions. arXiv:2404.00551.

[17] Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems, Volume 27. Curran Associates, Inc.

[18] Gy¨orfi, L., M. Kohler, A. Krzy˙zak, and H. Walk (2002). A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. New York, NY: Springer.

[19] Han, D., S. Zheng, G. Shen, X. Song, L. Sun, and J. Huang (2025). Deep mutual density ratio estimation with bregman divergence and its applications. Journal of the American Statistical Association 120(551), 1990–2001.

[20] Hinder, F., V. Vaquet, J. Brinkrolf, and B. Hammer (2021). Fast non-parametric conditional density estimation using moment trees. In 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7.

[21] Ho, J., A. Jain, and P. Abbeel (2020). Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, Volume 33, pp. 6840–6851. Curran Associates, Inc.

[22] Huang, J., Y. Jiao, Z. Li, S. Liu, Y. Wang, and Y. Yang (2022). An error analysis of generative adversarial networks for learning distributions. Journal of Machine Learning Research 23(116), 1–43.

[23] Huberman, D. B., B. J. Reich, and H. D. Bondell (2022). Nonparametric conditional density estimation in a deep learning framework for short-term forecasting. Environmental and Ecological Statistics 29(4), 677–704.

[24] Hyndman, R. J., D. M. Bashtannyk, and G. K. Grunwald (1996). Estimating and visualizing conditional densities. Journal of Computational and Graphical Statistics 5(4), 315–336.

[25] Hyndman, R. J. and Q. Yao (2002). Nonparametric estimation and symmetry tests for conditional density functions. J l f N t i St ti ti 14(3) 259 278

[26] Izbicki, R. and A. B. Lee (2016). Nonparametric conditional density estimation in a high-dimensional regression setting. Journal of Computational and Graphical Statistics 25(4), 1297–1316.

[27] Izbicki, R. and A. B. Lee (2017). Converting high-dimensional regression to high-dimensional conditional density estimation. Electronic Journal of Statistics 11(2), 2800–2831.

[28] Izbicki, R., A. B. Lee, and P. E. Freeman (2017). Photo-z estimation: An example of nonparametric conditional density estimation under selection bias. The Annals of Applied Statistics 11(2), 698–724.

[29] Jiao, Y., G. Shen, Y. Lin, and J. Huang (2023). Deep nonparametric regression on approximate manifolds: Nonasymptotic error bounds with polynomial prefactors. The Annals of Statistics 51(2), 691–716.

[30] Kingma, D. P. and M. Welling (2014). Auto-encoding variational Bayes. In International Conference on Learning Representations (ICLR).

[31] Li, M., M. Neykov, and S. Balakrishnan (2022). Minimax optimal conditional density estimation under total variation smoothness. Electronic Journal of Statistics 16(2), 3937–3972.

[32] Li, Q. and J. S. Racine (2006). Nonparametric Econometrics: Theory and Practice. Princeton University Press.

[33] Lindgren, G. (2012). Stationary Stochastic Processes: Theory and Applications. CRC Press.

[34] Lipman, Y., R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le (2023). Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations.

[35] Liu, X., C. Gong, and Q. Liu (2023). Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations.

[36] Mirza, M. and S. Osindero (2014). Conditional generative adversarial nets. arXiv:1411.1784.

[37] Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics 33(3), 1065–1076.

[38] Rezende, D. J. and S. Mohamed (2015). Variational inference with normalizing flows. In Proceedings of the 32 Nd International Conference on Machine Learning, Lille, France.

[39] Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics 27(3), 832–837.

[40] Rosendal, H. E. and S. L. Shaw (1982). Relationship of maximum sustained winds to minimum sea level pressure in central North Pacific tropical cyclones. Technicacl memorandum, United States, National Weather Service., Pacific Region.

[41] Rothfuss, J., F. Ferreira, S. Walther, and M. Ulrich (2019). Conditional density estimation with neural networks: Best practices and benchmarks. arXiv:1903.00954.

[42] Ruzgas, T., M. Lukauskas, and G. vCepkauskas (2021). Nonparametric multivariate density estimation: Case study of Cauchy mixture model. Mathematics 9(21), 2717.

[43] Samani, F. S., R. Stadler, C. Flinta, and A. Johnsson (2021). Conditional density estimation of service metrics for networked services. IEEE Transactions on Network and Service Management 18(2), 2350–2364.

[44] Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics 48(4), 1875–1897.

[45] Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization (2nd edition ed.). Hoboken, New Jersey: Wiley.

[46] Sheldon, E. S., C. E. Cunha, R. Mandelbaum, J. Brinkmann, and B. A. Weaver (2012). Photometric redshift probability distributions for galaxies in the SDSS DR8. The Astrophysical Journal Supplement Series 201(2), 32.

[47] Sohn, K., H. Lee, and X. Yan (2015). Learning Structured Output Representation using Deep Conditional Generative Models. In Advances in Neural Information Processing Systems, Volume 28. Curran Associates, Inc.

[48] Song, Y. and S. Ermon (2019). Generative modeling by estimating gradients of the data distribution. In Advances i N l I f ti P i S t V l 32 C A i t I

[49] Song, Y., J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole (2021). Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations.

[50] St´ephanovitch, A., E. Aamari, and C. Levrard (2024). Wasserstein generative adversarial networks are minimax optimal distribution estimators. The Annals of Statistics 52(5), 2167–2193.

[51] Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. The Annals of Statistics 10(4), 1040–1053.

[52] Vardi, G., G. Yehudai, and O. Shamir (2022). Width is less important than depth in ReLU neural networks. In Proceedings of Thirty Fifth Conference on Learning Theory, pp. 1249–1281. PMLR.

[53] Yan, X., Y. Su, and W. Ma (2023). Ensemble multi-quantiles: Adaptively flexible distribution prediction for uncertainty quantification. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(11), 13068–13082.

[54] Zhou, X., Y. Jiao, J. Liu, and J. Huang (2023). A deep generative approach to conditional sampling. Journal of the American Statistical Association 118(543), 1837–1848. Chenxuan He