Multiplicative Controller Fusion: Leveraging Algorithmic Priors for
Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer
- URL: http://arxiv.org/abs/2003.05117v3
- Date: Mon, 27 Jul 2020 07:02:39 GMT
- Title: Multiplicative Controller Fusion: Leveraging Algorithmic Priors for
Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer
- Authors: Krishan Rana, Vibhavari Dasagi, Ben Talbot, Michael Milford and Niko
S\"underhauf
- Abstract summary: We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions.
During training, our gated fusion approach enables the prior to guide the initial stages of exploration.
We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation.
- Score: 18.50206483493784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-based approaches often outperform hand-coded algorithmic solutions
for many problems in robotics. However, learning long-horizon tasks on real
robot hardware can be intractable, and transferring a learned policy from
simulation to reality is still extremely challenging. We present a novel
approach to model-free reinforcement learning that can leverage existing
sub-optimal solutions as an algorithmic prior during training and deployment.
During training, our gated fusion approach enables the prior to guide the
initial stages of exploration, increasing sample-efficiency and enabling
learning from sparse long-horizon reward signals. Importantly, the policy can
learn to improve beyond the performance of the sub-optimal prior since the
prior's influence is annealed gradually. During deployment, the policy's
uncertainty provides a reliable strategy for transferring a simulation-trained
policy to the real world by falling back to the prior controller in uncertain
states. We show the efficacy of our Multiplicative Controller Fusion approach
on the task of robot navigation and demonstrate safe transfer from simulation
to the real world without any fine-tuning. The code for this project is made
publicly available at https://sites.google.com/view/mcf-nav/home
Related papers
- Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks [48.54757719504994]
This paper focuses on improving task success rates while reducing the amount of training data needed.
Our approach introduces a novel method that segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals.
We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms.
arXiv Detail & Related papers (2024-10-01T19:49:56Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Sample-efficient Imitative Multi-token Decision Transformer for Real-world Driving [18.34685506480288]
We propose Sample-efficient Imitative Multi-token Decision Transformer (SimDT)
SimDT introduces multi-token prediction, online imitative learning pipeline and prioritized experience replay to sequence-modelling reinforcement learning.
Results exceed popular imitation and reinforcement learning algorithms both in open-loop and closed-loop settings on Waymax benchmark.
arXiv Detail & Related papers (2024-06-18T14:27:14Z) - Towards Real-World Efficiency: Domain Randomization in Reinforcement Learning for Pre-Capture of Free-Floating Moving Targets by Autonomous Robots [0.0]
We introduce a deep reinforcement learning-based control approach to address the intricate challenge of the robotic pre-grasping phase under microgravity conditions.
Our methodology incorporates an off-policy reinforcement learning framework, employing the soft actor-critic technique to enable the gripper to proficiently approach a free-floating moving object.
For effective learning of the pre-grasping approach task, we developed a reward function that offers the agent clear and insightful feedback.
arXiv Detail & Related papers (2024-06-10T16:54:51Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - DiAReL: Reinforcement Learning with Disturbance Awareness for Robust
Sim2Real Policy Transfer in Robot Control [0.0]
Delayed Markov decision processes fulfill the Markov property by augmenting the state space of agents with a finite time window of recently committed actions.
We introduce a disturbance-augmented Markov decision process in delayed settings as a novel representation to incorporate disturbance estimation in training on-policy reinforcement learning algorithms.
arXiv Detail & Related papers (2023-06-15T10:11:38Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Don't Start From Scratch: Leveraging Prior Data to Automate Robotic
Reinforcement Learning [70.70104870417784]
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems.
In practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment.
In this work, we study how these challenges can be tackled by effective utilization of diverse offline datasets collected from previously seen tasks.
arXiv Detail & Related papers (2022-07-11T08:31:22Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.