Enabling Efficient, Reliable Real-World Reinforcement Learning with
Approximate Physics-Based Models
- URL: http://arxiv.org/abs/2307.08168v2
- Date: Mon, 6 Nov 2023 15:15:38 GMT
- Title: Enabling Efficient, Reliable Real-World Reinforcement Learning with
Approximate Physics-Based Models
- Authors: Tyler Westenbroek, Jacob Levy, David Fridovich-Keil
- Abstract summary: We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data.
In this paper we introduce a novel policy gradient-based policy optimization framework.
We show that our approach can learn precise control strategies reliably and with only minutes of real-world data.
- Score: 10.472792899267365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We focus on developing efficient and reliable policy optimization strategies
for robot learning with real-world data. In recent years, policy gradient
methods have emerged as a promising paradigm for training control policies in
simulation. However, these approaches often remain too data inefficient or
unreliable to train on real robotic hardware. In this paper we introduce a
novel policy gradient-based policy optimization framework which systematically
leverages a (possibly highly simplified) first-principles model and enables
learning precise control policies with limited amounts of real-world data. Our
approach $1)$ uses the derivatives of the model to produce sample-efficient
estimates of the policy gradient and $2)$ uses the model to design a low-level
tracking controller, which is embedded in the policy class. Theoretical
analysis provides insight into how the presence of this feedback controller
overcomes key limitations of stand-alone policy gradient methods, while
hardware experiments with a small car and quadruped demonstrate that our
approach can learn precise control strategies reliably and with only minutes of
real-world data.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Model predictive control-based value estimation for efficient reinforcement learning [6.8237783245324035]
We design an improved reinforcement learning method based on model predictive control that models the environment through a data-driven approach.
Based on the learned environment model, it performs multi-step prediction to estimate the value function and optimize the policy.
The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the local optimal value, and less sample capacity space required by experience replay buffers.
arXiv Detail & Related papers (2023-10-25T13:55:14Z) - Enforcing the consensus between Trajectory Optimization and Policy
Learning for precise robot control [75.28441662678394]
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages.
We propose several improvements on top of these approaches to learn global control policies quicker.
arXiv Detail & Related papers (2022-09-19T13:32:09Z) - Fully Decentralized Model-based Policy Optimization for Networked
Systems [23.46407780093797]
This work aims to improve data efficiency of multi-agent control by model-based learning.
We consider networked systems where agents are cooperative and communicate only locally with their neighbors.
In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts.
arXiv Detail & Related papers (2022-07-13T23:52:14Z) - Model Generation with Provable Coverability for Offline Reinforcement
Learning [14.333861814143718]
offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization.
But due to the limitation under the offline setting, the learned model could not mimic real dynamics well enough to support reliable out-of-distribution exploration.
We propose an algorithm to generate models optimizing their coverage for the real dynamics.
arXiv Detail & Related papers (2022-06-01T08:34:09Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Learning Robust Controllers Via Probabilistic Model-Based Policy Search [2.886634516775814]
We investigate whether controllers learned in such a way are robust and able to generalize under small perturbations of the environment.
We show that enforcing a lower bound to the likelihood noise in the Gaussian Process dynamics model regularizes the policy updates and yields more robust controllers.
arXiv Detail & Related papers (2021-10-26T11:17:31Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Neural Lyapunov Model Predictive Control: Learning Safe Global
Controllers from Sub-optimal Examples [4.777323087050061]
In many real-world and industrial applications, it is typical to have an existing control strategy, for instance, execution from a human operator.
The objective of this work is to improve upon this unknown, safe but suboptimal policy by learning a new controller that retains safety and stability.
The proposed algorithm alternatively learns the terminal cost and updates the MPC parameters according to a stability metric.
arXiv Detail & Related papers (2020-02-21T16:57:38Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.