## Reinforcement Learning using interacting particle systems

Background

Reinforcement learning (RL) is a sub-field of machine learning. In popular literature, RL is referred to as artificial intelligence. A computer playing chess is an early textbook application of an RL algorithm. Self-driving autonomous cars is another application where RL algorithms are needed. In recent years, companies such as Deepmind and openAI (who introduced Chat GPT) have invested heavily in development and deployment of these algorithms.

While RL has enjoyed unprecedented attention in media, the mathematics of RL is still in its infancy. That explains, for example, why autonomous cars are still not as "intelligent" as a human driver.

Ongoing work

We are interested in the problem of reinforcement learning (RL) in continuous-time and continuous (Euclidean) state-space settings. A special case is the linear quadratic Gaussian (LQG) problem where the dynamics model is a linear system, the cost terms are quadratic, and the distributions – of the random initial condition and the noise – are Gaussian.

The LQG problem has a rich and storied history in modern control theory going back to the very origin of the subject. There are two issues which makes the LQG and related problems a topic of recent research interest: (i) In high-dimensions, the matrix-valued nature of the optimality equations (differential or algebraic Riccati equations) means that any algorithm is O(d squared) in the dimension d of the state-space; and (ii) the model parameters may not be explicitly available to write down the Riccati equation let alone solve it. The latter is a concern, e.g., when the model exists only in the form of a black-box numerical simulator.

In a recent paper, we introduced a simulation-based algorithm to solve the RL problem in LQG settings. The algorithm is based on a clever construction of an ensemble Kalman filter (EnKF) to approximate the optimal control law for the LQG problem. The proposed algorithm is simulation-based and, in particular, avoids the need to explicitly solve the Ricatti equation. Assuming full-state feedback, it directly yields the optimal control input.

We have extended these results to case of nonlinear dynamics. We utilize the well known duality proposed by Fleming and Mitter in 1982, between Hamilton Jacobi Bellman (HJB) equation in optimal control with the Zakai equation of filtering, which occurs through the log transform. We are able to propose a particle system which approximates the HJB equation, and can be simulated using only a blackbox model of the dynamics, without explicit knowledge of the model parameters.

Appended below are the results of applying our method on some example problems. The cart pole example is to stabilize an inverted pendulum on a cart. The plot shows a comparison of the optimal solution and the EnKF-based approximate solution (N is the number of particles). In the spring mass damper example, a plot of mean-squared error (MSE) in approximating the optimal solution is depicted, as a function of the number of particles (N) and the state dimension (d).

Control of an inverted pendulum on cart

Control of spring mass damper system

Resources on Dynamics and Optimal Control

Prof. Liberzon's book

Prof. Tedrake's course

Course notes by Prof. Belabbas

Course notes by Prof. Raginsky

Publications and preprints

Joshi, A. A., Taghvaei, A., Mehta, P.G., and Meyn S.P. "Controlled interacting particle algorithms for simulation-based reinforcement learning" (arXiv), Systems & Control Letters, Volume 170, 2022 Special issue dedicated to the memory of Ari Arapostathis

Mehta, P.G., and Meyn, S.P. "Q-learning and Pontryagin's maximum principle", IEEE Conference on Decision and Control, Shanghai, 2009.