Don't do it: Safer Reinforcement Learning With Rule-based Guidance
- URL: http://arxiv.org/abs/2212.13819v1
- Date: Wed, 28 Dec 2022 13:42:56 GMT
- Title: Don't do it: Safer Reinforcement Learning With Rule-based Guidance
- Authors: Ekaterina Nikonova, Cheng Xue, Jochen Renz
- Abstract summary: During training, reinforcement learning systems interact with the world without considering the safety of their actions.
We propose a new safe epsilon-greedy algorithm that uses safety rules to override agents' actions if they are considered to be unsafe.
- Score: 2.707154152696381
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: During training, reinforcement learning systems interact with the world
without considering the safety of their actions. When deployed into the real
world, such systems can be dangerous and cause harm to their surroundings.
Often, dangerous situations can be mitigated by defining a set of rules that
the system should not violate under any conditions. For example, in robot
navigation, one safety rule would be to avoid colliding with surrounding
objects and people. In this work, we define safety rules in terms of the
relationships between the agent and objects and use them to prevent
reinforcement learning systems from performing potentially harmful actions. We
propose a new safe epsilon-greedy algorithm that uses safety rules to override
agents' actions if they are considered to be unsafe. In our experiments, we
show that a safe epsilon-greedy policy significantly increases the safety of
the agent during training, improves the learning efficiency resulting in much
faster convergence, and achieves better performance than the base model.
Related papers
- Defining and Evaluating Physical Safety for Large Language Models [62.4971588282174]
Large Language Models (LLMs) are increasingly used to control robotic systems such as drones.
Their risks of causing physical threats and harm in real-world applications remain unexplored.
We classify the physical safety risks of drones into four categories: (1) human-targeted threats, (2) object-targeted threats, (3) infrastructure attacks, and (4) regulatory violations.
arXiv Detail & Related papers (2024-11-04T17:41:25Z) - Evaluation of Safety Constraints in Autonomous Navigation with Deep
Reinforcement Learning [62.997667081978825]
We compare two learnable navigation policies: safe and unsafe.
The safe policy takes the constraints into the account, while the other does not.
We show that the safe policy is able to generate trajectories with more clearance (distance to the obstacles) and makes less collisions while training without sacrificing the overall performance.
arXiv Detail & Related papers (2023-07-27T01:04:57Z) - Reinforcement Learning by Guided Safe Exploration [11.14908712905592]
We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal.
This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal.
We also regularize a target policy towards the guide while the student is unreliable and gradually eliminate the influence of the guide.
arXiv Detail & Related papers (2023-07-26T17:26:21Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - How to Learn from Risk: Explicit Risk-Utility Reinforcement Learning for
Efficient and Safe Driving Strategies [1.496194593196997]
This paper proposes SafeDQN, which allows to make the behavior of autonomous vehicles safe and interpretable while still being efficient.
We show that SafeDQN finds interpretable and safe driving policies for a variety of scenarios and demonstrate how state-of-the-art saliency techniques can help to assess both risk and utility.
arXiv Detail & Related papers (2022-03-16T05:51:22Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Learn Zero-Constraint-Violation Policy in Model-Free Constrained
Reinforcement Learning [7.138691584246846]
We propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions.
The safety index is designed to increase rapidly for potentially dangerous actions.
We claim that we can learn the energy function in a model-free manner similar to learning a value function.
arXiv Detail & Related papers (2021-11-25T07:24:30Z) - DESTA: A Framework for Safe Reinforcement Learning with Markov Games of
Intervention [17.017957942831938]
Current approaches for tackling safe learning in reinforcement learning (RL) lead to a trade-off between safe exploration and fulfilling the task.
We introduce a new two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA)
Our approach uses a new two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA)
arXiv Detail & Related papers (2021-10-27T14:35:00Z) - Safer Reinforcement Learning through Transferable Instinct Networks [6.09170287691728]
We present an approach where an additional policy can override the main policy and offer a safer alternative action.
In our instinct-regulated RL (IR2L) approach, an "instinctual" network is trained to recognize undesirable situations.
We demonstrate IR2L in the OpenAI Safety gym domain, in which it receives a significantly lower number of safety violations.
arXiv Detail & Related papers (2021-07-14T13:22:04Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Safe Reinforcement Learning via Curriculum Induction [94.67835258431202]
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
Existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations.
This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor.
arXiv Detail & Related papers (2020-06-22T10:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.