Tight Bounds for Online Convex Optimization with Adversarial Constraints
- URL: http://arxiv.org/abs/2405.09296v1
- Date: Wed, 15 May 2024 12:37:03 GMT
- Title: Tight Bounds for Online Convex Optimization with Adversarial Constraints
- Authors: Abhishek Sinha, Rahul Vaze,
- Abstract summary: In COCO, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen.
We show that an online policy can simultaneously achieve $O(sqrtT)$ regret and $tildeO(sqrtT)$ CCV without any restrictive assumptions.
- Score: 16.99491218081617
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring small cumulative constraint violation (CCV) against an adaptive adversary. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. We establish this result by effectively combining the adaptive regret bound of the AdaGrad algorithm with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.
Related papers
- $O(\sqrt{T})$ Static Regret and Instance Dependent Constraint Violation for Constrained Online Convex Optimization [16.99491218081617]
The objective is to simultaneously minimize the static regret and cumulative constraint violation (CCV)
An algorithm is proposed that guarantees a static regret of $O(sqrtT)$ and a CCV of $mincV, O(sqrtTlog T) $, where $cV$ depends on the distance between the consecutively revealed constraint sets.
arXiv Detail & Related papers (2025-02-07T15:47:04Z) - Projection-free Algorithms for Online Convex Optimization with Adversarial Constraints [10.047668792033033]
We study a generalization of the Online Convex Optimization (OCO) framework with time-varying adversarial constraints.
In this problem, after selecting a feasible action from the convex decision set $X,$ a convex constraint function is revealed alongside the cost function in each round.
We propose a *projection-free* online policy which makes a single call to a Linear Program (LP) solver per round.
arXiv Detail & Related papers (2025-01-28T13:04:32Z) - Revisiting Projection-Free Online Learning with Time-Varying Constraints [35.573654458435854]
We investigate constrained online convex optimization, in which decisions must belong to a fixed and typically complicated domain.
Several projection-free methods have been proposed with an $mathcalO(T3/4 sqrtlog T)$ regret bound and an $mathcalO(T3/4 sqrtlog T)$ cumulative constraint violation (CCV) bound for general convex losses.
In this paper, we improve this result and further establish textitnovel regret and CCV bounds when loss functions are strongly convex
arXiv Detail & Related papers (2025-01-27T13:38:51Z) - An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints [55.2480439325792]
We study Online Convex Optimization (OCO) with adversarial constraints.
We focus on a setting where the algorithm has access to predictions of the loss and constraint functions.
Our results show that we can improve the current best bounds of $ O(sqrtT) $ regret and $ tildeO(sqrtT) $ cumulative constraint violations.
arXiv Detail & Related papers (2024-12-11T03:06:42Z) - Optimal Algorithms for Online Convex Optimization with Adversarial Constraints [16.99491218081617]
In COCO, a convex cost function and a convex constraint function are revealed to the learner after it chooses the action for that round.
We show that an online policy can simultaneously achieve $O(sqrtT)$ regret and $tildeO(sqrtT)$ CCV without any restrictive assumptions.
arXiv Detail & Related papers (2023-10-29T09:55:41Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL)
We show an analogous result for vanilla Q-functions under a soft margin condition.
Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z) - Towards Painless Policy Optimization for Constrained MDPs [46.12526917024248]
We study policy optimization in an infinite horizon, $gamma$-discounted constrained Markov decision process (CMDP)
Our objective is to return a policy that achieves large expected reward with a small constraint violation.
We propose a generic primal-dual framework that allows us to bound the reward sub-optimality and constraint violation for arbitrary algorithms.
arXiv Detail & Related papers (2022-04-11T15:08:09Z) - Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities.
Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z) - Policy Finetuning: Bridging Sample-Efficient Offline and Online
Reinforcement Learning [59.02541753781001]
This paper initiates the theoretical study of policy finetuning, that is, online RL where the learner has additional access to a "reference policy"
We first design a sharp offline reduction algorithm that finds an $varepsilon$ near-optimal policy within $widetildeO(H3SCstar/varepsilon2)$ episodes.
We then establish an $Omega(H3SminCstar, A/varepsilon2)$ sample complexity lower bound for any policy finetuning algorithm, including those that can adaptively explore the
arXiv Detail & Related papers (2021-06-09T08:28:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.