DESTA: A Framework for Safe Reinforcement Learning with Markov Games of
Intervention
- URL: http://arxiv.org/abs/2110.14468v1
- Date: Wed, 27 Oct 2021 14:35:00 GMT
- Title: DESTA: A Framework for Safe Reinforcement Learning with Markov Games of
Intervention
- Authors: David Mguni, Joel Jennings, Taher Jafferjee, Aivar Sootla, Yaodong
Yang, Changmin Yu, Usman Islam, Ziyan Wang, Jun Wang
- Abstract summary: Current approaches for tackling safe learning in reinforcement learning (RL) lead to a trade-off between safe exploration and fulfilling the task.
We introduce a new two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA)
Our approach uses a new two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA)
- Score: 17.017957942831938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploring in an unknown system can place an agent in dangerous situations,
exposing to potentially catastrophic hazards. Many current approaches for
tackling safe learning in reinforcement learning (RL) lead to a trade-off
between safe exploration and fulfilling the task. Though these methods possibly
incur fewer safety violations, they often also lead to reduced task
performance. In this paper, we take the first step in introducing a generation
of RL solvers that learn to minimise safety violations while maximising the
task reward to the extend that can be tolerated by safe policies. Our approach
uses a new two-player framework for safe RL called Distributive Exploration
Safety Training Algorithm (DESTA). The core of DESTA is a novel game between
two RL agents: SAFETY AGENT that is delegated the task of minimising safety
violations and TASK AGENT whose goal is to maximise the reward set by the
environment task. SAFETY AGENT can selectively take control of the system at
any given point to prevent safety violations while TASK AGENT is free to
execute its actions at all other states. This framework enables SAFETY AGENT to
learn to take actions that minimise future safety violations (during and after
training) by performing safe actions at certain states while TASK AGENT
performs actions that maximise the task performance everywhere else. We
demonstrate DESTA's ability to tackle challenging tasks and compare against
state-of-the-art RL methods in Safety Gym Benchmarks which simulate real-world
physical systems and OpenAI's Lunar Lander.
Related papers
- Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding [5.5929450570003185]
Training RL agents in unknown, black-box environments poses an even greater safety risk when prior knowledge of the domain/task is unavailable.
We introduce ADVICE (Adaptive Shielding with a Contrastive Autoencoder), a novel post-shielding technique that distinguishes safe and unsafe features of state-action pairs during training.
arXiv Detail & Related papers (2024-05-28T13:47:21Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark [12.660770759420286]
We present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios.
We offer a library of algorithms named Safe Policy Optimization (SafePO), comprising 16 state-of-the-art SafeRL algorithms.
arXiv Detail & Related papers (2023-10-19T08:19:28Z) - Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery [13.333197887318168]
Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks.
We propose a method to construct a boundary that discriminates safe and unsafe states.
Our approach has better task performance with less safety violations than state-of-the-art algorithms.
arXiv Detail & Related papers (2023-06-24T12:02:50Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance [73.3242641337305]
Recent work learns risk measures which measure the probability of violating constraints, which can then be used to enable safety.
We cast safe exploration as an offline meta-RL problem, where the objective is to leverage examples of safe and unsafe behavior across a range of environments.
We then propose MEta-learning for Safe Adaptation (MESA), an approach for meta-learning Simulation a risk measure for safe RL.
arXiv Detail & Related papers (2021-12-07T08:57:35Z) - Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z) - Safer Reinforcement Learning through Transferable Instinct Networks [6.09170287691728]
We present an approach where an additional policy can override the main policy and offer a safer alternative action.
In our instinct-regulated RL (IR2L) approach, an "instinctual" network is trained to recognize undesirable situations.
We demonstrate IR2L in the OpenAI Safety gym domain, in which it receives a significantly lower number of safety violations.
arXiv Detail & Related papers (2021-07-14T13:22:04Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.