Back To Index Previous Article Next Article Full Text

Statistica Sinica 35 (2025), 831-852

REINFORCEMENT LEARNING VIA NONPARAMETRIC
SMOOTHING IN A CONTINUOUS-TIME
STOCHASTIC SETTING WITH NOISY DATA

Chenyang Jiang1, Bowen Hu1, Yazhen Wang1 and Shang Wu*2

1University of Wisconsin-Madison and 2Fudan University

Abstract: Reinforcement learning was developed mainly for discrete-time Markov decision processes. This paper establishes a novel learning approach based on temporal-difference and nonparametric smoothing to solve reinforcement learning problems in a continuous-time setting with noisy data, where the true model to learn is governed by an ordinary differential equation, and data samples are generated from a stochastic differential equation that is considered as a noisy version of the ordinary differential equation. Continuous-time temporal-difference learning developed for deterministic models is unstable and in fact diverges when applied to data generated from stochastic models. Furthermore, because there are measurement errors or noises in the observed data, a new reinforcement learning framework is needed to handle the learning problems with noisy data. We show that the proposed learning approach has a robust performance for learning deterministic functions based on noisy data generated from stochastic models governed by stochastic differential equations. An asymptotic theory is established for the proposed approach, and a numerical study is carried out to solve a pendulum reinforcement learning problem and check the finite sample performance of the proposed method.

Key words and phrases: HJB equation, Markov decision process, nonparametric smoothing, ordinary differential equation, policy, reinforcement learning, reward function, stochastic differential equation, value function.

Back To Index Previous Article Next Article Full Text