Online Safety Assurance for Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2010.03625v1
- Date: Wed, 7 Oct 2020 19:54:01 GMT
- Title: Online Safety Assurance for Deep Reinforcement Learning
- Authors: Noga H. Rotman, Michael Schapira and Aviv Tamar
- Abstract summary: We argue that safely deploying learning-driven systems requires being able to determine, in real time, whether system behavior is coherent.
We present three approaches to quantifying decision uncertainty that differ in terms of the signal used to infer uncertainty.
Our preliminary findings suggest that transitioning to a default policy when decision uncertainty is detected is key to enjoying the performance benefits afforded by leveraging ML without compromising safety.
- Score: 24.23670300606769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, deep learning has been successfully applied to a variety of
networking problems. A fundamental challenge is that when the operational
environment for a learning-augmented system differs from its training
environment, such systems often make badly informed decisions, leading to bad
performance. We argue that safely deploying learning-driven systems requires
being able to determine, in real time, whether system behavior is coherent, for
the purpose of defaulting to a reasonable heuristic when this is not so. We
term this the online safety assurance problem (OSAP). We present three
approaches to quantifying decision uncertainty that differ in terms of the
signal used to infer uncertainty. We illustrate the usefulness of online safety
assurance in the context of the proposed deep reinforcement learning (RL)
approach to video streaming. While deep RL for video streaming bests other
approaches when the operational and training environments match, it is
dominated by simple heuristics when the two differ. Our preliminary findings
suggest that transitioning to a default policy when decision uncertainty is
detected is key to enjoying the performance benefits afforded by leveraging ML
without compromising on safety.
Related papers
- Towards Safe Load Balancing based on Control Barrier Functions and Deep
Reinforcement Learning [0.691367883100748]
We propose a safe learning-based load balancing algorithm for Software Defined-Wide Area Network (SD-WAN)
It is empowered by Deep Reinforcement Learning (DRL) combined with a Control Barrier Function (CBF)
We show that our approach delivers near-optimal Quality-of-Service (QoS) in terms of end-to-end delay while respecting safety requirements related to link capacity constraints.
arXiv Detail & Related papers (2024-01-10T19:43:12Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Robust Policy Learning over Multiple Uncertainty Sets [91.67120465453179]
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments.
We develop an algorithm that enjoys the benefits of both system identification and robust RL.
arXiv Detail & Related papers (2022-02-14T20:06:28Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Safer Reinforcement Learning through Transferable Instinct Networks [6.09170287691728]
We present an approach where an additional policy can override the main policy and offer a safer alternative action.
In our instinct-regulated RL (IR2L) approach, an "instinctual" network is trained to recognize undesirable situations.
We demonstrate IR2L in the OpenAI Safety gym domain, in which it receives a significantly lower number of safety violations.
arXiv Detail & Related papers (2021-07-14T13:22:04Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z) - Falsification-Based Robust Adversarial Reinforcement Learning [13.467693018395863]
falsification-based RARL (FRARL) is the first generic framework for integrating temporal logic falsification in adversarial learning to improve policy robustness.
Our experimental results demonstrate that policies trained with a falsification-based adversary generalize better and show less violation of the safety specification in test scenarios.
arXiv Detail & Related papers (2020-07-01T18:32:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.