Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning
- URL: http://arxiv.org/abs/2212.06998v1
- Date: Wed, 14 Dec 2022 03:11:25 GMT
- Title: Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning
- Authors: Linrui Zhang and Zichen Yan and Li Shen and Shoujie Li and Xueqian
Wang and Dacheng Tao
- Abstract summary: We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
- Score: 64.11013095004786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning a risk-aware policy is essential but rather challenging in
unstructured robotic tasks. Safe reinforcement learning methods open up new
possibilities to tackle this problem. However, the conservative policy updates
make it intractable to achieve sufficient exploration and desirable performance
in complex, sample-expensive environments. In this paper, we propose a
dual-agent safe reinforcement learning strategy consisting of a baseline and a
safe agent. Such a decoupled framework enables high flexibility, data
efficiency and risk-awareness for RL-based control. Concretely, the baseline
agent is responsible for maximizing rewards under standard RL settings. Thus,
it is compatible with off-the-shelf training techniques of unconstrained
optimization, exploration and exploitation. On the other hand, the safe agent
mimics the baseline agent for policy improvement and learns to fulfill safety
constraints via off-policy RL tuning. In contrast to training from scratch,
safe policy correction requires significantly fewer interactions to obtain a
near-optimal policy. The dual policies can be optimized synchronously via a
shared replay buffer, or leveraging the pre-trained model or the
non-learning-based controller as a fixed baseline agent. Experimental results
show that our approach can learn feasible skills without prior knowledge as
well as deriving risk-averse counterparts from pre-trained unsafe policies. The
proposed method outperforms the state-of-the-art safe RL algorithms on
difficult robot locomotion and manipulation tasks with respect to both safety
constraint satisfaction and sample efficiency.
Related papers
- Safe Reinforcement Learning in a Simulated Robotic Arm [0.0]
Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies.
In this paper, we extend the applicability of safe RL algorithms by creating a customized environment with Panda robotic arm.
arXiv Detail & Related papers (2023-11-28T19:22:16Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe Model-Based Reinforcement Learning with an Uncertainty-Aware
Reachability Certificate [6.581362609037603]
We build a safe reinforcement learning framework to resolve constraints required by the DRC and its corresponding shield policy.
We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy.
arXiv Detail & Related papers (2022-10-14T06:16:53Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Minimizing Safety Interference for Safe and Comfortable Automated
Driving with Distributional Reinforcement Learning [3.923354711049903]
We propose a distributional reinforcement learning framework to learn adaptive policies that can tune their level of conservativity at run-time based on the desired comfort and utility.
We show that our algorithm learns policies that can still drive reliable when the perception noise is two times higher than the training configuration for automated merging and crossing at occluded intersections.
arXiv Detail & Related papers (2021-07-15T13:36:55Z) - Safe Reinforcement Learning Using Advantage-Based Intervention [45.79740561754542]
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints.
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training.
Our method comes with strong guarantees on safety during both training and deployment.
arXiv Detail & Related papers (2021-06-16T20:28:56Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Runtime-Safety-Guided Policy Repair [13.038017178545728]
We study the problem of policy repair for learning-based control policies in safety-critical settings.
We propose to reduce or even eliminate control switching by repairing' the trained policy based on runtime data produced by the safety controller.
arXiv Detail & Related papers (2020-08-17T23:31:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.