Related papers: Optimal Algorithms for Online Convex Optimization with Adversarial Constraints

Optimal Algorithms for Online Convex Optimization with Adversarial Constraints

URL: http://arxiv.org/abs/2310.18955v3
Date: Sun, 27 Oct 2024 16:37:56 GMT
Title: Optimal Algorithms for Online Convex Optimization with Adversarial Constraints
Authors: Abhishek Sinha, Rahul Vaze,
Abstract summary: In COCO, a convex cost function and a convex constraint function are revealed to the learner after it chooses the action for that round. We show that an online policy can simultaneously achieve $O(sqrtT)$ regret and $tildeO(sqrtT)$ CCV without any restrictive assumptions.
Score: 16.99491218081617
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: A well-studied generalization of the standard online convex optimization (OCO) framework is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after it chooses the action for that round. The objective is to design an online learning policy that simultaneously achieves a small regret while ensuring a small cumulative constraint violation (CCV) against an adaptive adversary interacting over a horizon of length $T$. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that a simple first-order policy can simultaneously achieve these bounds. Furthermore, in the case of strongly convex cost and convex constraint functions, the regret guarantee can be improved to $O(\log T)$ while keeping the CCV bound the same as above. We establish these results by effectively combining adaptive OCO policies as a blackbox with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.

Related papers

Optimal Bounds for Adversarial Constrained Online Convex Optimization [1.9336815376402723]
We show for the first time that is possible to obtain the optimal $O(sqrtT)$ bound on both regret and CCV. Based on a new surrogate loss function enforcing a minimum penalty on the constraint function, we demonstrate that both the Follow-the-Regularized-Leader and the Online Gradient Descent achieve the optimal bounds.
arXiv Detail & Related papers (2025-03-17T16:51:16Z)
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability [49.96531901205305]
We propose the emphfirst algorithm with $tildeO(epsilon-1)$ sample complexity under single-policy concentrability for offline contextual bandits. Our proof leverages the strong convexity of the KL regularization, and the conditional non-negativity of the gap between the true reward and its pessimistic estimator. We extend our algorithm to contextual dueling bandits and achieve a similar nearly optimal sample complexity.
arXiv Detail & Related papers (2025-02-09T22:14:45Z)
$O(\sqrt{T})$ Static Regret and Instance Dependent Constraint Violation for Constrained Online Convex Optimization [16.99491218081617]
The objective is to simultaneously minimize the static regret and cumulative constraint violation (CCV) An algorithm is proposed that guarantees a static regret of $O(sqrtT)$ and a CCV of $mincV, O(sqrtTlog T) $, where $cV$ depends on the distance between the consecutively revealed constraint sets.
arXiv Detail & Related papers (2025-02-07T15:47:04Z)
Projection-free Algorithms for Online Convex Optimization with Adversarial Constraints [10.047668792033033]
We study a generalization of the Online Convex Optimization (OCO) framework with time-varying adversarial constraints. In this problem, after selecting a feasible action from the convex decision set $X,$ a convex constraint function is revealed alongside the cost function in each round. We propose a *projection-free* online policy which makes a single call to a Linear Program (LP) solver per round.
arXiv Detail & Related papers (2025-01-28T13:04:32Z)
Tight Bounds for Online Convex Optimization with Adversarial Constraints [16.99491218081617]
In COCO, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. We show that an online policy can simultaneously achieve $O(sqrtT)$ regret and $tildeO(sqrtT)$ CCV without any restrictive assumptions.
arXiv Detail & Related papers (2024-05-15T12:37:03Z)
Optimistic Safety for Online Convex Optimization with Unknown Linear Constraints [31.526232903811533]
We introduce an algorithm that we term Optimistically Safe OCO (OSOCO) and show it enjoys $tildeO(sqrtT)$ regret and no constraint violation. In the case of static linear constraints, this improves on the previous best known $tildeO(T2/3)$ regret under the same assumptions. In the case of time-varying constraints, our work supplements existing results that show $O(sqrtT)$ regret and $O(sqrtT)$ cumulative violation
arXiv Detail & Related papers (2024-03-09T04:01:39Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity. We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z)
Improved Dynamic Regret for Online Frank-Wolfe [54.690867216880356]
We investigate the dynamic regret of online Frank-Wolfe (OFW), which is an efficient projection-free algorithm for online convex optimization. In this paper, we derive improved dynamic regret bounds for OFW by extending the fast convergence rates of FW from offline optimization to online optimization.
arXiv Detail & Related papers (2023-02-11T07:19:51Z)
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL) We show an analogous result for vanilla Q-functions under a soft margin condition. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z)
Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints [4.879346089164413]
We optimize a black-box reward function $f(x)$ subject to a black-box constraint function $g(x)leq 0$ over a continuous space. We propose a Rectified Pessimistic-Optimistic Learning framework (RPOL), a penalty-based method incorporating optimistic and pessimistic GP bandit learning for reward and constraint functions.
arXiv Detail & Related papers (2022-11-27T04:28:16Z)
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation [80.86358123230757]
We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI) Under a partial data coverage assumption, BCP-VI yields a fast rate of $tildemathcalO(frac1K)$ for offline RL when there is a positive gap in the optimal Q-value functions. These are the first $tildemathcalO(frac1K)$ bound and absolute zero sub-optimality bound respectively for offline RL with linear function approximation from adaptive data.
arXiv Detail & Related papers (2022-11-23T18:50:44Z)
Exploiting the Curvature of Feasible Sets for Faster Projection-Free Online Learning [8.461907111368628]
We develop new efficient projection-free algorithms for Online Convex Optimization (OCO) We develop an OCO algorithm that makes two calls to an LO Oracle per round and achieves the near-optimal $widetildeO(sqrtT)$ regret. We also present an algorithm for general convex sets that makes $widetilde O(d)$ expected number of calls to an LO Oracle per round.
arXiv Detail & Related papers (2022-05-23T17:13:46Z)
Towards Painless Policy Optimization for Constrained MDPs [46.12526917024248]
We study policy optimization in an infinite horizon, $gamma$-discounted constrained Markov decision process (CMDP) Our objective is to return a policy that achieves large expected reward with a small constraint violation. We propose a generic primal-dual framework that allows us to bound the reward sub-optimality and constraint violation for arbitrary algorithms.
arXiv Detail & Related papers (2022-04-11T15:08:09Z)
Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities. Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.