Actively Learning Reinforcement Learning: A Stochastic Optimal Control
Approach
- URL: http://arxiv.org/abs/2309.10831v3
- Date: Mon, 26 Feb 2024 21:51:13 GMT
- Title: Actively Learning Reinforcement Learning: A Stochastic Optimal Control
Approach
- Authors: Mohammad S. Ramadan, Mahmoud A. Hayajnh, Michael T. Tolley, Kyriakos
G. Vamvoudakis
- Abstract summary: We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, such that it regulates state and uncertainties resulting from modeling mismatches and noisy sensory; and (ii) overcoming the huge computational cost of optimal control.
We approach both objectives by using reinforcement learning to attain the optimal control law.
- Score: 3.7728340443952577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we propose a framework towards achieving two intertwined
objectives: (i) equipping reinforcement learning with active exploration and
deliberate information gathering, such that it regulates state and parameter
uncertainties resulting from modeling mismatches and noisy sensory; and (ii)
overcoming the huge computational cost of stochastic optimal control. We
approach both objectives by using reinforcement learning to attain the
stochastic optimal control law. On one hand, we avoid the curse of
dimensionality prohibiting the direct solution of the stochastic dynamic
programming equation. On the other hand, the resulting stochastic control
inspired reinforcement learning agent admits the behavior of a dual control,
namely, caution and probing, that is, regulating the state estimate together
with its estimation quality. Unlike exploration and exploitation, caution and
probing are employed automatically by the controller in real-time, even after
the learning process is concluded. We use the proposed approach on a numerical
example of a model that belongs to an emerging class in system identification.
We show how, for the dimensionality of the stochastic version of this model,
Dynamic Programming is prohibitive, Model Predictive Control requires an
expensive nonlinear optimization, and a Linear Quadratic Regulator with the
certainty equivalence assumption leads to poor performance and filter
divergence, all contrasting our approach which is shown to be both:
computationally convenient, stabilizing and of an acceptable performance.
Related papers
- Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes [4.229902091180109]
We propose a novel, stability-certified IRL approach to learning control Lyapunov functions from demonstrations data.
By exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs.
We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world data.
arXiv Detail & Related papers (2024-05-14T16:40:45Z) - Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian
Score Climbing [3.9410617513331863]
optimal control of dynamical systems is a crucial challenge in sequential decision-making.
Control-as-inference approaches have had considerable success, providing a viable risk-sensitive framework to address the exploration-exploitation dilemma.
This paper introduces a novel perspective by framing risk-sensitive control as Markovian reinforcement score climbing under samples drawn from a conditional particle filter.
arXiv Detail & Related papers (2023-12-21T16:34:03Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Incorporating Recurrent Reinforcement Learning into Model Predictive
Control for Adaptive Control in Autonomous Driving [11.67417895998434]
Model Predictive Control (MPC) is attracting tremendous attention in the autonomous driving task as a powerful control technique.
In this paper, we reformulate the problem as a Partially Observed Markov Decision Process (POMDP)
We then learn a recurrent policy continually adapting the parameters of the dynamics model via Recurrent Reinforcement Learning (RRL) for optimal and adaptive control.
arXiv Detail & Related papers (2023-01-30T22:11:07Z) - Adaptive Robust Model Predictive Control via Uncertainty Cancellation [25.736296938185074]
We propose a learning-based robust predictive control algorithm that compensates for significant uncertainty in the dynamics.
We optimize over a class of nonlinear feedback policies inspired by certainty equivalent "estimate-and-cancel" control laws.
arXiv Detail & Related papers (2022-12-02T18:54:23Z) - Active Learning of Discrete-Time Dynamics for Uncertainty-Aware Model
Predictive Control [49.60520501097199]
We present a self-supervised learning approach that actively models the dynamics of nonlinear robotic systems.
Our approach showcases high resilience and generalization capabilities by consistently adapting to unseen flight conditions.
arXiv Detail & Related papers (2022-10-23T00:45:05Z) - Adaptive Model Predictive Control by Learning Classifiers [26.052368583196426]
We propose an adaptive MPC variant that automatically estimates control and model parameters.
We leverage recent results showing that BO can be formulated as a density ratio estimation.
This is then integrated into a model predictive path integral control framework yielding robust controllers for a variety of challenging robotics tasks.
arXiv Detail & Related papers (2022-03-13T23:22:12Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design.
We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.