Improving Safety in Deep Reinforcement Learning using Unsupervised
Action Planning
- URL: http://arxiv.org/abs/2109.14325v1
- Date: Wed, 29 Sep 2021 10:26:29 GMT
- Title: Improving Safety in Deep Reinforcement Learning using Unsupervised
Action Planning
- Authors: Hao-Lun Hsu, Qiuhua Huang, Sehoon Ha
- Abstract summary: One of the key challenges to deep reinforcement learning (deep RL) is to ensure safety at both training and testing phases.
We propose a novel technique of unsupervised action planning to improve the safety of on-policy reinforcement learning algorithms.
Our results show that the proposed safety RL algorithm can achieve higher rewards compared with multiple baselines in both discrete and continuous control problems.
- Score: 4.2955354157580325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key challenges to deep reinforcement learning (deep RL) is to
ensure safety at both training and testing phases. In this work, we propose a
novel technique of unsupervised action planning to improve the safety of
on-policy reinforcement learning algorithms, such as trust region policy
optimization (TRPO) or proximal policy optimization (PPO). We design our
safety-aware reinforcement learning by storing all the history of "recovery"
actions that rescue the agent from dangerous situations into a separate
"safety" buffer and finding the best recovery action when the agent encounters
similar states. Because this functionality requires the algorithm to query
similar states, we implement the proposed safety mechanism using an
unsupervised learning algorithm, k-means clustering. We evaluate the proposed
algorithm on six robotic control tasks that cover navigation and manipulation.
Our results show that the proposed safety RL algorithm can achieve higher
rewards compared with multiple baselines in both discrete and continuous
control problems. The supplemental video can be found at:
https://youtu.be/AFTeWSohILo.
Related papers
- Reinforcement Learning with Ensemble Model Predictive Safety
Certification [2.658598582858331]
unsupervised exploration prevents the deployment of reinforcement learning algorithms on safety-critical tasks.
We propose a new algorithm that combines model-based deep reinforcement learning with tube-based model predictive control to correct the actions taken by a learning agent.
Our results show that we can achieve significantly fewer constraint violations than comparable reinforcement learning methods.
arXiv Detail & Related papers (2024-02-06T17:42:39Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Barrier Certified Safety Learning Control: When Sum-of-Square
Programming Meets Reinforcement Learning [0.0]
This work adopts control barrier functions over reinforcement learning, and proposes a compensated algorithm to completely maintain safety.
Compared to quadratic programming based reinforcement learning methods, our sum-of-squares programming based reinforcement learning has shown its superiority.
arXiv Detail & Related papers (2022-06-16T04:38:50Z) - Safely Bridging Offline and Online Reinforcement Learning [17.67983988254856]
We design an algorithm that uses a UCB reinforcement learning policy for exploration, but overrides it to ensure safety with high probability.
We experimentally validate our results on a sepsis treatment task, demonstrating that our algorithm can learn while ensuring good performance compared to the baseline policy for every patient.
arXiv Detail & Related papers (2021-10-25T15:57:16Z) - Safe Reinforcement Learning Using Advantage-Based Intervention [45.79740561754542]
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints.
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training.
Our method comes with strong guarantees on safety during both training and deployment.
arXiv Detail & Related papers (2021-06-16T20:28:56Z) - Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses.
We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.