Safe Online Convex Optimization with Unknown Linear Safety Constraints
- URL: http://arxiv.org/abs/2111.07430v1
- Date: Sun, 14 Nov 2021 19:49:19 GMT
- Title: Safe Online Convex Optimization with Unknown Linear Safety Constraints
- Authors: Sapana Chaudhary and Dileep Kalathil
- Abstract summary: We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints.
The parameters that specify the linear safety constraints are unknown to the algorithm.
We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret $O(T2/3)$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of safe online convex optimization, where the action at
each time step must satisfy a set of linear safety constraints. The goal is to
select a sequence of actions to minimize the regret without violating the
safety constraints at any time step (with high probability). The parameters
that specify the linear safety constraints are unknown to the algorithm. The
algorithm has access to only the noisy observations of constraints for the
chosen actions. We propose an algorithm, called the {Safe Online Projected
Gradient Descent} (SO-PGD) algorithm, to address this problem. We show that,
under the assumption of the availability of a safe baseline action, the SO-PGD
algorithm achieves a regret $O(T^{2/3})$. While there are many algorithms for
online convex optimization (OCO) problems with safety constraints available in
the literature, they allow constraint violations during learning/optimization,
and the focus has been on characterizing the cumulative constraint violations.
To the best of our knowledge, ours is the first work that provides an algorithm
with provable guarantees on the regret, without violating the linear safety
constraints (with high probability) at any time step.
Related papers
- Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time [0.6554326244334868]
We present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint.
We show that the learned policy is safe with high confidence.
We also demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.
arXiv Detail & Related papers (2024-03-23T20:22:30Z) - Truly No-Regret Learning in Constrained MDPs [61.78619476991494]
We propose a model-based primal-dual algorithm to learn in an unknown CMDP.
We prove that our algorithm achieves sublinear regret without error cancellations.
arXiv Detail & Related papers (2024-02-24T09:47:46Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL)
We show an analogous result for vanilla Q-functions under a soft margin condition.
Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Safe Adaptive Learning-based Control for Constrained Linear Quadratic
Regulators with Regret Guarantees [11.627320138064684]
We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions.
Our algorithm is implemented on a single trajectory and does not require system restarts.
arXiv Detail & Related papers (2021-10-31T05:52:42Z) - Excursion Search for Constrained Bayesian Optimization under a Limited
Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures.
Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.