Online Convex Optimization with Stochastic Constraints: Zero Constraint
Violation and Bandit Feedback
- URL: http://arxiv.org/abs/2301.11267v2
- Date: Thu, 13 Jul 2023 23:40:26 GMT
- Title: Online Convex Optimization with Stochastic Constraints: Zero Constraint
Violation and Bandit Feedback
- Authors: Yeongjong Kim, Dabeen Lee
- Abstract summary: We propose a variant of the drift-plus-penalty algorithm that guarantees $O(sqrtT)$ expected regret and zero constraint violation.
Our algorithm is oblivious to the length of the time horizon $T$, in contrast to the vanilla drift-plus-penalty method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies online convex optimization with stochastic constraints. We
propose a variant of the drift-plus-penalty algorithm that guarantees
$O(\sqrt{T})$ expected regret and zero constraint violation, after a fixed
number of iterations, which improves the vanilla drift-plus-penalty method with
$O(\sqrt{T})$ constraint violation. Our algorithm is oblivious to the length of
the time horizon $T$, in contrast to the vanilla drift-plus-penalty method.
This is based on our novel drift lemma that provides time-varying bounds on the
virtual queue drift and, as a result, leads to time-varying bounds on the
expected virtual queue length. Moreover, we extend our framework to
stochastic-constrained online convex optimization under two-point bandit
feedback. We show that by adapting our algorithmic framework to the bandit
feedback setting, we may still achieve $O(\sqrt{T})$ expected regret and zero
constraint violation, improving upon the previous work for the case of
identical constraint functions. Numerical results demonstrate our theoretical
results.
Related papers
- Optimistic Safety for Online Convex Optimization with Unknown Linear Constraints [31.526232903811533]
We introduce an algorithm that we term Optimistically Safe OCO (OSOCO) and show it enjoys $tildeO(sqrtT)$ regret and no constraint violation.
In the case of static linear constraints, this improves on the previous best known $tildeO(T2/3)$ regret under the same assumptions.
In the case of time-varying constraints, our work supplements existing results that show $O(sqrtT)$ regret and $O(sqrtT)$ cumulative violation
arXiv Detail & Related papers (2024-03-09T04:01:39Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Projection-Free Online Convex Optimization with Stochastic Constraints [0.0]
We develop projection-free algorithms for online convex optimization with constraints.
We deduce sublinear regret and constraint violation bounds for various settings.
We prove that the constraint violation can be reduced to have the same growth as the regret.
arXiv Detail & Related papers (2023-05-02T11:27:34Z) - Improved Dynamic Regret for Online Frank-Wolfe [54.690867216880356]
We investigate the dynamic regret of online Frank-Wolfe (OFW), which is an efficient projection-free algorithm for online convex optimization.
In this paper, we derive improved dynamic regret bounds for OFW by extending the fast convergence rates of FW from offline optimization to online optimization.
arXiv Detail & Related papers (2023-02-11T07:19:51Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed
Bandit with Constraints [4.879346089164413]
We optimize a black-box reward function $f(x)$ subject to a black-box constraint function $g(x)leq 0$ over a continuous space.
We propose a Rectified Pessimistic-Optimistic Learning framework (RPOL), a penalty-based method incorporating optimistic and pessimistic GP bandit learning for reward and constraint functions.
arXiv Detail & Related papers (2022-11-27T04:28:16Z) - Regret and Cumulative Constraint Violation Analysis for Online Convex
Optimization with Long Term Constraints [24.97580261894342]
This paper considers online convex optimization with long term constraints, where constraints can be violated in intermediate rounds, but need to be satisfied in the long run.
A novel algorithm is first proposed and it achieves an $mathcalO(Tmaxc,1-c)$ bound for static regret and an $mathcalO(T(1-c)/2)$ bound for cumulative constraint violation.
arXiv Detail & Related papers (2021-06-09T15:18:06Z) - Online Strongly Convex Optimization with Unknown Delays [30.931538196386672]
We investigate the problem of online convex optimization with unknown delays.
We first extend the delayed variant of OGD for strongly convex functions.
We establish a better regret bound of $O(dlog T)$, where $d$ is the maximum delay.
arXiv Detail & Related papers (2021-03-21T10:16:15Z) - Efficient Optimistic Exploration in Linear-Quadratic Regulators via
Lagrangian Relaxation [107.06364966905821]
We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting.
Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of ofulq.
We show that an $epsilon$-optimistic controller can be computed efficiently by solving at most $Obig(log (1/epsilon)big)$ Riccati equations.
arXiv Detail & Related papers (2020-07-13T16:30:47Z) - Dynamic Regret of Convex and Smooth Functions [93.71361250701075]
We investigate online convex optimization in non-stationary environments.
We choose the dynamic regret as the performance measure.
We show that it is possible to further enhance the dynamic regret by exploiting the smoothness condition.
arXiv Detail & Related papers (2020-07-07T14:10:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.