Automata Learning meets Shielding
- URL: http://arxiv.org/abs/2212.01838v1
- Date: Sun, 4 Dec 2022 14:58:12 GMT
- Title: Automata Learning meets Shielding
- Authors: Martin Tappler, Stefan Pranger, Bettina K\"onighofer, Edi
Mu\v{s}kardin, Roderick Bloem and Kim Larsen
- Abstract summary: Safety is still one of the major research challenges in reinforcement learning (RL)
In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments.
Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach.
- Score: 1.1417805445492082
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Safety is still one of the major research challenges in reinforcement
learning (RL). In this paper, we address the problem of how to avoid safety
violations of RL agents during exploration in probabilistic and partially
unknown environments. Our approach combines automata learning for Markov
Decision Processes (MDPs) and shield synthesis in an iterative approach.
Initially, the MDP representing the environment is unknown. The agent starts
exploring the environment and collects traces. From the collected traces, we
passively learn MDPs that abstractly represent the safety-relevant aspects of
the environment. Given a learned MDP and a safety specification, we construct a
shield. For each state-action pair within a learned MDP, the shield computes
exact probabilities on how likely it is that executing the action results in
violating the specification from the current state within the next $k$ steps.
After the shield is constructed, the shield is used during runtime and blocks
any actions that induce a too large risk from the agent. The shielded agent
continues to explore the environment and collects new data on the environment.
Iteratively, we use the collected data to learn new MDPs with higher accuracy,
resulting in turn in shields able to prevent more safety violations. We
implemented our approach and present a detailed case study of a Q-learning
agent exploring slippery Gridworlds. In our experiments, we show that as the
agent explores more and more of the environment during training, the improved
learned models lead to shields that are able to prevent many safety violations.
Related papers
- No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Model-based Dynamic Shielding for Safe and Efficient Multi-Agent
Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases.
Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z) - Safe MDP Planning by Learning Temporal Patterns of Undesirable
Trajectories and Averting Negative Side Effects [27.41101006357176]
In safe MDP planning, a cost function based on the current state and action is often used to specify safety aspects.
operating based on an incomplete model can often produce unintended negative side effects (NSEs)
arXiv Detail & Related papers (2023-04-06T14:03:24Z) - Online Shielding for Reinforcement Learning [59.86192283565134]
We propose an approach for online safety shielding of RL agents.
During runtime, the shield analyses the safety of each available action.
Based on this probability and a given threshold, the shield decides whether to block an action from the agent.
arXiv Detail & Related papers (2022-12-04T16:00:29Z) - MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance [73.3242641337305]
Recent work learns risk measures which measure the probability of violating constraints, which can then be used to enable safety.
We cast safe exploration as an offline meta-RL problem, where the objective is to leverage examples of safe and unsafe behavior across a range of environments.
We then propose MEta-learning for Safe Adaptation (MESA), an approach for meta-learning Simulation a risk measure for safe RL.
arXiv Detail & Related papers (2021-12-07T08:57:35Z) - Safe Reinforcement Learning via Curriculum Induction [94.67835258431202]
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
Existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations.
This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor.
arXiv Detail & Related papers (2020-06-22T10:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.