Related papers: Safe Online Convex Optimization with Unknown Linear Safety Constraints

Safe Online Convex Optimization with Unknown Linear Safety Constraints

URL: http://arxiv.org/abs/2111.07430v1
Date: Sun, 14 Nov 2021 19:49:19 GMT
Title: Safe Online Convex Optimization with Unknown Linear Safety Constraints
Authors: Sapana Chaudhary and Dileep Kalathil
Abstract summary: We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints. The parameters that specify the linear safety constraints are unknown to the algorithm. We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret $O(T2/3)$.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints. The goal is to select a sequence of actions to minimize the regret without violating the safety constraints at any time step (with high probability). The parameters that specify the linear safety constraints are unknown to the algorithm. The algorithm has access to only the noisy observations of constraints for the chosen actions. We propose an algorithm, called the {Safe Online Projected Gradient Descent} (SO-PGD) algorithm, to address this problem. We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret $O(T^{2/3})$. While there are many algorithms for online convex optimization (OCO) problems with safety constraints available in the literature, they allow constraint violations during learning/optimization, and the focus has been on characterizing the cumulative constraint violations. To the best of our knowledge, ours is the first work that provides an algorithm with provable guarantees on the regret, without violating the linear safety constraints (with high probability) at any time step.

Related papers

Safety in safe Bayesian optimization and its ramifications for control [6.450289319821615]
In control engineering, parameters of a pre-designed controller are often tuned online in feedback with a plant. In particular, machine learning methods have been deployed for this important problem, in particular, Bayesian optimization (BO) We identify two significant obstacles to practical safety. First, SafeOpt-type algorithms rely on quantitative uncertainty bounds, and most implementations replace these by theoretically unsupporteds. We propose Lipschitz-only Safe Bayesian Optimization (LoSBO), a safe BO algorithm that relies only on a known Lipschitz bound for its safety.
arXiv Detail & Related papers (2025-01-23T14:24:11Z)
Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time [0.6554326244334868]
We present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. We show that the learned policy is safe with high confidence. We also demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.
arXiv Detail & Related papers (2024-03-23T20:22:30Z)
Truly No-Regret Learning in Constrained MDPs [61.78619476991494]
We propose a model-based primal-dual algorithm to learn in an unknown CMDP. We prove that our algorithm achieves sublinear regret without error cancellations.
arXiv Detail & Related papers (2024-02-24T09:47:46Z)
SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization. In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints. Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z)
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL) We show an analogous result for vanilla Q-functions under a soft margin condition. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)
Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems. We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z)
Safe Adaptive Learning-based Control for Constrained Linear Quadratic Regulators with Regret Guarantees [11.627320138064684]
We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions. Our algorithm is implemented on a single trajectory and does not require system restarts.
arXiv Detail & Related papers (2021-10-31T05:52:42Z)
Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures. Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z)
Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.