Bayesian Controller Fusion: Leveraging Control Priors in Deep
Reinforcement Learning for Robotics
- URL: http://arxiv.org/abs/2107.09822v3
- Date: Mon, 3 Apr 2023 05:32:36 GMT
- Title: Bayesian Controller Fusion: Leveraging Control Priors in Deep
Reinforcement Learning for Robotics
- Authors: Krishan Rana, Vibhavari Dasagi, Jesse Haviland, Ben Talbot, Michael
Milford and Niko S\"underhauf
- Abstract summary: We present a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL)
BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient.
We show BCF's applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world.
- Score: 17.660913275007317
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Bayesian Controller Fusion (BCF): a hybrid control strategy that
combines the strengths of traditional hand-crafted controllers and model-free
deep reinforcement learning (RL). BCF thrives in the robotics domain, where
reliable but suboptimal control priors exist for many tasks, but RL from
scratch remains unsafe and data-inefficient. By fusing uncertainty-aware
distributional outputs from each system, BCF arbitrates control between them,
exploiting their respective strengths. We study BCF on two real-world robotics
tasks involving navigation in a vast and long-horizon environment, and a
complex reaching task that involves manipulability maximisation. For both these
domains, simple handcrafted controllers exist that can solve the task at hand
in a risk-averse manner but do not necessarily exhibit the optimal solution
given limitations in analytical modelling, controller miscalibration and task
variation. As exploration is naturally guided by the prior in the early stages
of training, BCF accelerates learning, while substantially improving beyond the
performance of the control prior, as the policy gains more experience. More
importantly, given the risk-aversity of the control prior, BCF ensures safe
exploration and deployment, where the control prior naturally dominates the
action distribution in states unknown to the policy. We additionally show BCF's
applicability to the zero-shot sim-to-real setting and its ability to deal with
out-of-distribution states in the real world. BCF is a promising approach
towards combining the complementary strengths of deep RL and traditional
robotic control, surpassing what either can achieve independently. The code and
supplementary video material are made publicly available at
https://krishanrana.github.io/bcf.
Related papers
- A comparison of RL-based and PID controllers for 6-DOF swimming robots:
hybrid underwater object tracking [8.362739554991073]
We present an exploration and assessment of employing a centralized deep Q-network (DQN) controller as a substitute for PID controllers.
Our primary focus centers on illustrating this transition with the specific case of underwater object tracking.
Our experiments, conducted within a Unity-based simulator, validate the effectiveness of a centralized RL agent over separated PID controllers.
arXiv Detail & Related papers (2024-01-29T23:14:15Z) - Reaching the Limit in Autonomous Racing: Optimal Control versus
Reinforcement Learning [66.10854214036605]
A central question in robotics is how to design a control system for an agile mobile robot.
We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting.
Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour.
arXiv Detail & Related papers (2023-10-17T02:40:27Z) - Safe Neural Control for Non-Affine Control Systems with Differentiable
Control Barrier Functions [58.19198103790931]
This paper addresses the problem of safety-critical control for non-affine control systems.
It has been shown that optimizing quadratic costs subject to state and control constraints can be sub-optimally reduced to a sequence of quadratic programs (QPs) by using Control Barrier Functions (CBFs)
We incorporate higher-order CBFs into neural ordinary differential equation-based learning models as differentiable CBFs to guarantee safety for non-affine control systems.
arXiv Detail & Related papers (2023-09-06T05:35:48Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Skip Training for Multi-Agent Reinforcement Learning Controller for
Industrial Wave Energy Converters [94.84709449845352]
Recent Wave Energy Converters (WEC) are equipped with multiple legs and generators to maximize energy generation.
Traditional controllers have shown limitations to capture complex wave patterns and the controllers must efficiently maximize the energy capture.
This paper introduces a Multi-Agent Reinforcement Learning controller (MARL), which outperforms the traditionally used spring damper controller.
arXiv Detail & Related papers (2022-09-13T00:20:31Z) - Zero-Shot Uncertainty-Aware Deployment of Simulation Trained Policies on
Real-World Robots [17.710172337571617]
Deep reinforcement learning (RL) agents tend to make errors when deployed in the real world due to mismatches between the training and execution environments.
We propose a novel uncertainty-aware deployment strategy that combines the strengths of deep RL policies and traditional handcrafted controllers.
We show promising results on two real-world continuous control tasks, where BCF outperforms both the standalone policy and controller.
arXiv Detail & Related papers (2021-12-10T02:13:01Z) - Optimization Algorithm for Feedback and Feedforward Policies towards
Robot Control Robust to Sensing Failures [1.7970523486905976]
We propose a new optimization problem for optimizing both the FB/FF policies simultaneously.
In numerical simulations and a robot experiment, we verified that the proposed method can stably optimize the composed policy even with the different learning law from the traditional RL.
arXiv Detail & Related papers (2021-04-01T10:41:42Z) - Learning a Contact-Adaptive Controller for Robust, Efficient Legged
Locomotion [95.1825179206694]
We present a framework that synthesizes robust controllers for a quadruped robot.
A high-level controller learns to choose from a set of primitives in response to changes in the environment.
A low-level controller that utilizes an established control method to robustly execute the primitives.
arXiv Detail & Related papers (2020-09-21T16:49:26Z) - Optimal PID and Antiwindup Control Design as a Reinforcement Learning
Problem [3.131740922192114]
We focus on the interpretability of DRL control methods.
In particular, we view linear fixed-structure controllers as shallow neural networks embedded in the actor-critic framework.
arXiv Detail & Related papers (2020-05-10T01:05:26Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.