Related papers: Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics

Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics

URL: http://arxiv.org/abs/2107.09822v3
Date: Mon, 3 Apr 2023 05:32:36 GMT
Title: Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics
Authors: Krishan Rana, Vibhavari Dasagi, Jesse Haviland, Ben Talbot, Michael Milford and Niko S\"underhauf
Abstract summary: We present a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL) BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. We show BCF's applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world.
Score: 17.660913275007317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF's applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at https://krishanrana.github.io/bcf.

Related papers

RLPP: A Residual Method for Zero-Shot Real-World Autonomous Racing on Scaled Platforms [9.517327026260181]
We propose RLPP, a residual RL framework that enhances a Pure Pursuit controller with an RL-based residual. RLPP improves lap times of the baseline controllers by up to 6.37 %, closing the gap to the State-of-the-Art methods by more than 52 %. RLPP is made available as an open-source tool, encouraging further exploration and advancement in autonomous racing research.
arXiv Detail & Related papers (2025-01-28T21:48:18Z)
A comparison of RL-based and PID controllers for 6-DOF swimming robots: hybrid underwater object tracking [8.362739554991073]
We present an exploration and assessment of employing a centralized deep Q-network (DQN) controller as a substitute for PID controllers. Our primary focus centers on illustrating this transition with the specific case of underwater object tracking. Our experiments, conducted within a Unity-based simulator, validate the effectiveness of a centralized RL agent over separated PID controllers.
arXiv Detail & Related papers (2024-01-29T23:14:15Z)
Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning [66.10854214036605]
A central question in robotics is how to design a control system for an agile mobile robot. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour.
arXiv Detail & Related papers (2023-10-17T02:40:27Z)
Safe Neural Control for Non-Affine Control Systems with Differentiable Control Barrier Functions [58.19198103790931]
This paper addresses the problem of safety-critical control for non-affine control systems. It has been shown that optimizing quadratic costs subject to state and control constraints can be sub-optimally reduced to a sequence of quadratic programs (QPs) by using Control Barrier Functions (CBFs) We incorporate higher-order CBFs into neural ordinary differential equation-based learning models as differentiable CBFs to guarantee safety for non-affine control systems.
arXiv Detail & Related papers (2023-09-06T05:35:48Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Skip Training for Multi-Agent Reinforcement Learning Controller for Industrial Wave Energy Converters [94.84709449845352]
Recent Wave Energy Converters (WEC) are equipped with multiple legs and generators to maximize energy generation. Traditional controllers have shown limitations to capture complex wave patterns and the controllers must efficiently maximize the energy capture. This paper introduces a Multi-Agent Reinforcement Learning controller (MARL), which outperforms the traditionally used spring damper controller.
arXiv Detail & Related papers (2022-09-13T00:20:31Z)
Zero-Shot Uncertainty-Aware Deployment of Simulation Trained Policies on Real-World Robots [17.710172337571617]
Deep reinforcement learning (RL) agents tend to make errors when deployed in the real world due to mismatches between the training and execution environments. We propose a novel uncertainty-aware deployment strategy that combines the strengths of deep RL policies and traditional handcrafted controllers. We show promising results on two real-world continuous control tasks, where BCF outperforms both the standalone policy and controller.
arXiv Detail & Related papers (2021-12-10T02:13:01Z)
Optimization Algorithm for Feedback and Feedforward Policies towards Robot Control Robust to Sensing Failures [1.7970523486905976]
We propose a new optimization problem for optimizing both the FB/FF policies simultaneously. In numerical simulations and a robot experiment, we verified that the proposed method can stably optimize the composed policy even with the different learning law from the traditional RL.
arXiv Detail & Related papers (2021-04-01T10:41:42Z)
Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion [95.1825179206694]
We present a framework that synthesizes robust controllers for a quadruped robot. A high-level controller learns to choose from a set of primitives in response to changes in the environment. A low-level controller that utilizes an established control method to robustly execute the primitives.
arXiv Detail & Related papers (2020-09-21T16:49:26Z)
Optimal PID and Antiwindup Control Design as a Reinforcement Learning Problem [3.131740922192114]
We focus on the interpretability of DRL control methods. In particular, we view linear fixed-structure controllers as shallow neural networks embedded in the actor-critic framework.
arXiv Detail & Related papers (2020-05-10T01:05:26Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.