Active Learning with Safety Constraints
- URL: http://arxiv.org/abs/2206.11183v1
- Date: Wed, 22 Jun 2022 15:45:38 GMT
- Title: Active Learning with Safety Constraints
- Authors: Romain Camilleri, Andrew Wagenmaker, Jamie Morgenstern, Lalit Jain,
Kevin Jamieson
- Abstract summary: We investigate the complexity of learning the best safe decision in interactive environments.
We propose an adaptive experimental design-based algorithm, which we show efficiently trades off between the difficulty of showing an arm is unsafe vs suboptimal.
- Score: 25.258564629480063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active learning methods have shown great promise in reducing the number of
samples necessary for learning. As automated learning systems are adopted into
real-time, real-world decision-making pipelines, it is increasingly important
that such algorithms are designed with safety in mind. In this work we
investigate the complexity of learning the best safe decision in interactive
environments. We reduce this problem to a constrained linear bandits problem,
where our goal is to find the best arm satisfying certain (unknown) safety
constraints. We propose an adaptive experimental design-based algorithm, which
we show efficiently trades off between the difficulty of showing an arm is
unsafe vs suboptimal. To our knowledge, our results are the first on best-arm
identification in linear bandits with safety constraints. In practice, we
demonstrate that this approach performs well on synthetic and real world
datasets.
Related papers
- Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Interactively Learning Preference Constraints in Linear Bandits [100.78514640066565]
We study sequential decision-making with known rewards and unknown constraints.
As an application, we consider learning constraints to represent human preferences in a driving simulation.
arXiv Detail & Related papers (2022-06-10T17:52:58Z) - Safety and Liveness Guarantees through Reach-Avoid Reinforcement
Learning [24.56889192688925]
Reach-avoid optimal control problems are central to safety and liveness assurance for autonomous robotic systems.
Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive.
Recent work has shown promise in extending the reinforcement learning machinery to handle safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time.
arXiv Detail & Related papers (2021-12-23T00:44:38Z) - Best Arm Identification with Safety Constraints [3.7783523378336112]
The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems.
We study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many.
We propose an algorithm in this setting which is guaranteed to learn safely.
arXiv Detail & Related papers (2021-11-23T20:53:12Z) - Efficient falsification approach for autonomous vehicle validation using
a parameter optimisation technique based on reinforcement learning [6.198523595657983]
The widescale deployment of Autonomous Vehicles (AV) appears to be imminent despite many safety challenges that are yet to be resolved.
The uncertainties in the behaviour of the traffic participants and the dynamic world cause reactions in advanced autonomous systems.
This paper presents an efficient falsification method to evaluate the System Under Test.
arXiv Detail & Related papers (2020-11-16T02:56:13Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.