Thompson Sampling Efficiently Learns to Control Diffusion Processes
- URL: http://arxiv.org/abs/2206.09977v1
- Date: Mon, 20 Jun 2022 19:42:49 GMT
- Title: Thompson Sampling Efficiently Learns to Control Diffusion Processes
- Authors: Mohamad Kazem Shirani Faradonbeh, Mohamad Sadegh Shirani Faradonbeh,
Mohsen Bayati
- Abstract summary: We show that a Thompson sampling algorithm learns optimal actions fast, incurring only a square-root of time regret, and also stabilizes the system in a short time period.
To the best of our knowledge, this is the first such result for Thompson sampling in a diffusion process control problem.
Our theoretical analysis involves characterization of a certain optimality manifold that ties the local geometry of the drift parameters to the optimal control of the diffusion process.
- Score: 4.254099382808599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion processes that evolve according to linear stochastic differential
equations are an important family of continuous-time dynamic decision-making
models. Optimal policies are well-studied for them, under full certainty about
the drift matrices. However, little is known about data-driven control of
diffusion processes with uncertain drift matrices as conventional discrete-time
analysis techniques are not applicable. In addition, while the task can be
viewed as a reinforcement learning problem involving exploration and
exploitation trade-off, ensuring system stability is a fundamental component of
designing optimal policies. We establish that the popular Thompson sampling
algorithm learns optimal actions fast, incurring only a square-root of time
regret, and also stabilizes the system in a short time period. To the best of
our knowledge, this is the first such result for Thompson sampling in a
diffusion process control problem. We validate our theoretical results through
empirical simulations with real parameter matrices from two settings of
airplane and blood glucose control. Moreover, we observe that Thompson sampling
significantly improves (worst-case) regret, compared to the state-of-the-art
algorithms, suggesting Thompson sampling explores in a more guarded fashion.
Our theoretical analysis involves characterization of a certain optimality
manifold that ties the local geometry of the drift parameters to the optimal
control of the diffusion process. We expect this technique to be of broader
interest.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Data-driven rules for multidimensional reflection problems [1.0742675209112622]
We study a multivariate singular control problem for reversible diffusions with controls of reflection type.
For given diffusion dynamics, assuming the optimal domain to be strongly star-shaped, we propose a gradient descent algorithm based on polytope approximations to numerically determine a cost-minimizing domain.
Finally, we investigate data-driven solutions when the diffusion dynamics are unknown to the controller.
arXiv Detail & Related papers (2023-11-11T18:36:17Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits.
For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
arXiv Detail & Related papers (2022-11-11T02:23:39Z) - Stochastic optimal well control in subsurface reservoirs using
reinforcement learning [0.0]
We present a case study of model-free reinforcement learning framework to solve optimal control for a predefined parameter uncertainty distribution.
In principle, RL algorithms are capable of learning optimal action policies to maximize a numerical reward signal.
We present numerical results using two state-of-the-art RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C) on two subsurface flow test cases.
arXiv Detail & Related papers (2022-07-07T17:34:23Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Reinforcement Learning Policies in Continuous-Time Linear Systems [0.0]
We present online policies that learn optimal actions fast by carefully randomizing the parameter estimates.
We prove sharp stability results for inexact system dynamics and tightly specify the infinitesimal regret caused by sub-optimal actions.
Our analysis sheds light on fundamental challenges in continuous-time reinforcement learning and suggests a useful cornerstone for similar problems.
arXiv Detail & Related papers (2021-09-16T00:08:50Z) - Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design.
We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.