Adaptive Real Time Exploration and Optimization for Safety-Critical
Systems
- URL: http://arxiv.org/abs/2211.05495v1
- Date: Thu, 10 Nov 2022 11:37:22 GMT
- Title: Adaptive Real Time Exploration and Optimization for Safety-Critical
Systems
- Authors: Buse Sibel Korkmaz (1), Mehmet Mercang\"oz (1), Marta Zag\'orowska (2)
((1) Imperial College London, (2) ETH Z\"urich)
- Abstract summary: We propose the ARTEO algorithm, where we cast multi-armed bandits as a programming problem subject to safety constraints.
We learn the environmental characteristics through changes in optimization inputs and through exploration.
Compared to existing safe-learning approaches, our algorithm does not require an exclusive exploration phase and follows the optimization goals even in the explored points.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We consider the problem of decision-making under uncertainty in an
environment with safety constraints. Many business and industrial applications
rely on real-time optimization with changing inputs to improve key performance
indicators. In the case of unknown environmental characteristics, real-time
optimization becomes challenging, particularly for the satisfaction of safety
constraints. We propose the ARTEO algorithm, where we cast multi-armed bandits
as a mathematical programming problem subject to safety constraints and learn
the environmental characteristics through changes in optimization inputs and
through exploration. We quantify the uncertainty in unknown characteristics by
using Gaussian processes and incorporate it into the utility function as a
contribution which drives exploration. We adaptively control the size of this
contribution using a heuristic in accordance with the requirements of the
environment. We guarantee the safety of our algorithm with a high probability
through confidence bounds constructed under the regularity assumptions of
Gaussian processes. Compared to existing safe-learning approaches, our
algorithm does not require an exclusive exploration phase and follows the
optimization goals even in the explored points, which makes it suitable for
safety-critical systems. We demonstrate the safety and efficiency of our
approach with two experiments: an industrial process and an online bid
optimization benchmark problem.
Related papers
- Safe Time-Varying Optimization based on Gaussian Processes with Spatio-Temporal Kernel [4.586346034304039]
TVSafeOpt is an algorithm for time-varying optimization problems with unknown reward and safety functions.
TVSafeOpt is capable of safely tracking a time-varying safe region without need for explicit change detection.
We show that TVSafeOpt compares favorably against SafeOpt on synthetic data, both regarding safety and optimality.
arXiv Detail & Related papers (2024-09-26T16:09:19Z) - Information-Theoretic Safe Bayesian Optimization [59.758009422067005]
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2024-02-23T14:31:10Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Meta-Learning Priors for Safe Bayesian Optimization [72.8349503901712]
We build on a meta-learning algorithm, F-PACOH, capable of providing reliable uncertainty quantification in settings of data scarcity.
As core contribution, we develop a novel framework for choosing safety-compliant priors in a data-riven manner.
On benchmark functions and a high-precision motion system, we demonstrate that our meta-learned priors accelerate the convergence of safe BO approaches.
arXiv Detail & Related papers (2022-10-03T08:38:38Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Are Evolutionary Algorithms Safe Optimizers? [3.3044468943230427]
This paper aims to reignite interest in safe optimization problems (SafeOPs) in the evolutionary computation (EC) community.
We provide a formal definition of SafeOPs, investigate the impact of key SafeOP parameters on the performance of selected safe optimization algorithms, and benchmark EC against state-of-the-art safe optimization algorithms.
arXiv Detail & Related papers (2022-03-24T17:11:36Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Safe Policy Optimization with Local Generalized Linear Function
Approximations [17.84511819022308]
Existing safe exploration methods guaranteed safety under the assumption of regularity.
We propose a novel algorithm, SPO-LF, that optimize an agent's policy while learning the relation between a locally available feature obtained by sensors and environmental reward/safety.
We experimentally show that our algorithm is 1) more efficient in terms of sample complexity and computational cost and 2) more applicable to large-scale problems than previous safe RL methods with theoretical guarantees.
arXiv Detail & Related papers (2021-11-09T00:47:50Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.