GoSafeOpt: Scalable Safe Exploration for Global Optimization of
Dynamical Systems
- URL: http://arxiv.org/abs/2201.09562v5
- Date: Mon, 12 Jun 2023 12:20:59 GMT
- Title: GoSafeOpt: Scalable Safe Exploration for Global Optimization of
Dynamical Systems
- Authors: Bhavya Sukhija, Matteo Turchetta, David Lindner, Andreas Krause,
Sebastian Trimpe, Dominik Baumann
- Abstract summary: This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for high-dimensional systems.
We demonstrate the superiority of GoSafeOpt over competing model-free safe learning methods on a robot arm.
- Score: 75.22958991597069
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Learning optimal control policies directly on physical systems is challenging
since even a single failure can lead to costly hardware damage. Most existing
model-free learning methods that guarantee safety, i.e., no failures, during
exploration are limited to local optima. A notable exception is the GoSafe
algorithm, which, unfortunately, cannot handle high-dimensional systems and
hence cannot be applied to most real-world dynamical systems. This work
proposes GoSafeOpt as the first algorithm that can safely discover globally
optimal policies for high-dimensional systems while giving safety and
optimality guarantees. We demonstrate the superiority of GoSafeOpt over
competing model-free safe learning methods on a robot arm that would be
prohibitive for GoSafe.
Related papers
- Safe Bayesian Optimization for the Control of High-Dimensional Embodied Systems [8.69908615905782]
Current safe exploration algorithms exhibit inefficiency and may even become infeasible with large high-dimensional input spaces.
Existing high-dimensional constrained optimization methods neglect safety in the search process.
arXiv Detail & Related papers (2024-12-29T04:42:50Z) - ABNet: Attention BarrierNet for Safe and Scalable Robot Learning [58.4951884593569]
Barrier-based method is one of the dominant approaches for safe robot learning.
We propose Attention BarrierNet (ABNet) that is scalable to build larger foundational safe models in an incremental manner.
We demonstrate the strength of ABNet in 2D robot obstacle avoidance, safe robot manipulation, and vision-based end-to-end autonomous driving.
arXiv Detail & Related papers (2024-06-18T19:37:44Z) - ISAACS: Iterative Soft Adversarial Actor-Critic for Safety [0.9217021281095907]
This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems.
A safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error.
While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter.
arXiv Detail & Related papers (2022-12-06T18:53:34Z) - Meta-Learning Priors for Safe Bayesian Optimization [72.8349503901712]
We build on a meta-learning algorithm, F-PACOH, capable of providing reliable uncertainty quantification in settings of data scarcity.
As core contribution, we develop a novel framework for choosing safety-compliant priors in a data-riven manner.
On benchmark functions and a high-precision motion system, we demonstrate that our meta-learned priors accelerate the convergence of safe BO approaches.
arXiv Detail & Related papers (2022-10-03T08:38:38Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Guided Safe Shooting: model based reinforcement learning with safety constraints [3.8490154494129327]
We introduce Guided Safe Shooting (GuSS), a model-based RL approach that can learn to control systems with minimal violations of the safety constraints.
We propose three different safe planners, one based on a simple random shooting strategy and two based on MAP-Elites, a more advanced divergent-search algorithm.
arXiv Detail & Related papers (2022-06-20T12:46:35Z) - GoSafe: Globally Optimal Safe Robot Learning [11.77348161331335]
SafeOpt is an efficient Bayesian optimization algorithm that can learn policies while guaranteeing safety with high probability.
We extend this method by exploring outside the initial safe area while still guaranteeing safety with high probability.
We derive conditions for guaranteed convergence to the global optimum and validate GoSafe in hardware experiments.
arXiv Detail & Related papers (2021-05-27T16:27:47Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z) - Safe reinforcement learning for probabilistic reachability and safety
specifications: A Lyapunov-based approach [2.741266294612776]
We propose a model-free safety specification method that learns the maximal probability of safe operation.
Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage.
It yields a sequence of safe policies that determine the range of safe operation, called the safe set.
arXiv Detail & Related papers (2020-02-24T09:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.