Safely Bridging Offline and Online Reinforcement Learning
- URL: http://arxiv.org/abs/2110.13060v1
- Date: Mon, 25 Oct 2021 15:57:16 GMT
- Title: Safely Bridging Offline and Online Reinforcement Learning
- Authors: Wanqiao Xu, Kan Xu, Hamsa Bastani, Osbert Bastani
- Abstract summary: We design an algorithm that uses a UCB reinforcement learning policy for exploration, but overrides it to ensure safety with high probability.
We experimentally validate our results on a sepsis treatment task, demonstrating that our algorithm can learn while ensuring good performance compared to the baseline policy for every patient.
- Score: 17.67983988254856
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A key challenge to deploying reinforcement learning in practice is exploring
safely. We propose a natural safety property -- \textit{uniformly}
outperforming a conservative policy (adaptively estimated from all data
observed thus far), up to a per-episode exploration budget. We then design an
algorithm that uses a UCB reinforcement learning policy for exploration, but
overrides it as needed to ensure safety with high probability. We
experimentally validate our results on a sepsis treatment task, demonstrating
that our algorithm can learn while ensuring good performance compared to the
baseline policy for every patient.
Related papers
- Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Guiding Safe Exploration with Weakest Preconditions [15.469452301122177]
In reinforcement learning for safety-critical settings, it is desirable for the agent to obey safety constraints at all points in time.
We present a novel neurosymbolic approach called SPICE to solve this safe exploration problem.
arXiv Detail & Related papers (2022-09-28T14:58:41Z) - Barrier Certified Safety Learning Control: When Sum-of-Square
Programming Meets Reinforcement Learning [0.0]
This work adopts control barrier functions over reinforcement learning, and proposes a compensated algorithm to completely maintain safety.
Compared to quadratic programming based reinforcement learning methods, our sum-of-squares programming based reinforcement learning has shown its superiority.
arXiv Detail & Related papers (2022-06-16T04:38:50Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Improving Safety in Deep Reinforcement Learning using Unsupervised
Action Planning [4.2955354157580325]
One of the key challenges to deep reinforcement learning (deep RL) is to ensure safety at both training and testing phases.
We propose a novel technique of unsupervised action planning to improve the safety of on-policy reinforcement learning algorithms.
Our results show that the proposed safety RL algorithm can achieve higher rewards compared with multiple baselines in both discrete and continuous control problems.
arXiv Detail & Related papers (2021-09-29T10:26:29Z) - Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.