Shielded Reinforcement Learning for Hybrid Systems
- URL: http://arxiv.org/abs/2308.14424v1
- Date: Mon, 28 Aug 2023 09:04:52 GMT
- Title: Shielded Reinforcement Learning for Hybrid Systems
- Authors: Asger Horn Brorholt and Peter Gj{\o}l Jensen and Kim Guldstrand Larsen
and Florian Lorber and Christian Schilling
- Abstract summary: Reinforcement learning has been leveraged to construct near-optimal controllers, but their behavior is not guaranteed to be safe.
One way of imposing safety to a learned controller is to use a shield, which is correct by design.
We propose the construction of a shield using the so-called barbaric method, where an approximate finite representation of an underlying partition-based two-player safety game is extracted.
- Score: 1.0485739694839669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safe and optimal controller synthesis for switched-controlled hybrid systems,
which combine differential equations and discrete changes of the system's
state, is known to be intricately hard. Reinforcement learning has been
leveraged to construct near-optimal controllers, but their behavior is not
guaranteed to be safe, even when it is encouraged by reward engineering. One
way of imposing safety to a learned controller is to use a shield, which is
correct by design. However, obtaining a shield for non-linear and hybrid
environments is itself intractable. In this paper, we propose the construction
of a shield using the so-called barbaric method, where an approximate finite
representation of an underlying partition-based two-player safety game is
extracted via systematically picked samples of the true transition function.
While hard safety guarantees are out of reach, we experimentally demonstrate
strong statistical safety guarantees with a prototype implementation and UPPAAL
STRATEGO. Furthermore, we study the impact of the synthesized shield when
applied as either a pre-shield (applied before learning a controller) or a
post-shield (only applied after learning a controller). We experimentally
demonstrate superiority of the pre-shielding approach. We apply our technique
on a range of case studies, including two industrial examples, and further
study post-optimization of the post-shielding approach.
Related papers
- Leveraging Approximate Model-based Shielding for Probabilistic Safety
Guarantees in Continuous Environments [63.053364805943026]
We extend the approximate model-based shielding framework to the continuous setting.
In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms.
arXiv Detail & Related papers (2024-02-01T17:55:08Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Model-based Dynamic Shielding for Safe and Efficient Multi-Agent
Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases.
Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z) - ISAACS: Iterative Soft Adversarial Actor-Critic for Safety [0.9217021281095907]
This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems.
A safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error.
While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter.
arXiv Detail & Related papers (2022-12-06T18:53:34Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and
Generalization Guarantees [7.6347172725540995]
Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world.
We propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy distribution.
arXiv Detail & Related papers (2022-01-20T18:41:01Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Learning Hybrid Control Barrier Functions from Data [66.37785052099423]
Motivated by the lack of systematic tools to obtain safe control laws for hybrid systems, we propose an optimization-based framework for learning certifiably safe control laws from data.
In particular, we assume a setting in which the system dynamics are known and in which data exhibiting safe system behavior is available.
arXiv Detail & Related papers (2020-11-08T23:55:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.