Constrained Online Convex Optimization with Polyak Feasibility Steps
- URL: http://arxiv.org/abs/2502.13112v1
- Date: Tue, 18 Feb 2025 18:26:20 GMT
- Title: Constrained Online Convex Optimization with Polyak Feasibility Steps
- Authors: Spencer Hutchinson, Mahnoosh Alizadeh,
- Abstract summary: We study online convex optimization with a fixed constraint function $g : mathbbRd rightarrow mathbbR$.
We show a stronger guarantee of anytime constraint satisfaction $g(x_t) leq 0 forall in [T]$, and matching $O(sqrtT)$ regret guarantees.
- Score: 3.928749790761187
- License:
- Abstract: In this work, we study online convex optimization with a fixed constraint function $g : \mathbb{R}^d \rightarrow \mathbb{R}$. Prior work on this problem has shown $O(\sqrt{T})$ regret and cumulative constraint satisfaction $\sum_{t=1}^{T} g(x_t) \leq 0$, while only accessing the constraint value and subgradient at the played actions $g(x_t), \partial g(x_t)$. Using the same constraint information, we show a stronger guarantee of anytime constraint satisfaction $g(x_t) \leq 0 \ \forall t \in [T]$, and matching $O(\sqrt{T})$ regret guarantees. These contributions are thanks to our approach of using Polyak feasibility steps to ensure constraint satisfaction, without sacrificing regret. Specifically, after each step of online gradient descent, our algorithm applies a subgradient descent step on the constraint function where the step-size is chosen according to the celebrated Polyak step-size. We further validate this approach with numerical experiments.
Related papers
- $O(\sqrt{T})$ Static Regret and Instance Dependent Constraint Violation for Constrained Online Convex Optimization [16.99491218081617]
The objective is to simultaneously minimize the static regret and cumulative constraint violation (CCV)
An algorithm is proposed that guarantees a static regret of $O(sqrtT)$ and a CCV of $mincV, O(sqrtTlog T) $, where $cV$ depends on the distance between the consecutively revealed constraint sets.
arXiv Detail & Related papers (2025-02-07T15:47:04Z) - An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints [55.2480439325792]
We study Online Convex Optimization (OCO) with adversarial constraints.
We focus on a setting where the algorithm has access to predictions of the loss and constraint functions.
Our results show that we can improve the current best bounds of $ O(sqrtT) $ regret and $ tildeO(sqrtT) $ cumulative constraint violations.
arXiv Detail & Related papers (2024-12-11T03:06:42Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Improved Dynamic Regret for Online Frank-Wolfe [54.690867216880356]
We investigate the dynamic regret of online Frank-Wolfe (OFW), which is an efficient projection-free algorithm for online convex optimization.
In this paper, we derive improved dynamic regret bounds for OFW by extending the fast convergence rates of FW from offline optimization to online optimization.
arXiv Detail & Related papers (2023-02-11T07:19:51Z) - Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed
Bandit with Constraints [4.879346089164413]
We optimize a black-box reward function $f(x)$ subject to a black-box constraint function $g(x)leq 0$ over a continuous space.
We propose a Rectified Pessimistic-Optimistic Learning framework (RPOL), a penalty-based method incorporating optimistic and pessimistic GP bandit learning for reward and constraint functions.
arXiv Detail & Related papers (2022-11-27T04:28:16Z) - Regret and Cumulative Constraint Violation Analysis for Online Convex
Optimization with Long Term Constraints [24.97580261894342]
This paper considers online convex optimization with long term constraints, where constraints can be violated in intermediate rounds, but need to be satisfied in the long run.
A novel algorithm is first proposed and it achieves an $mathcalO(Tmaxc,1-c)$ bound for static regret and an $mathcalO(T(1-c)/2)$ bound for cumulative constraint violation.
arXiv Detail & Related papers (2021-06-09T15:18:06Z) - Safe Learning under Uncertain Objectives and Constraints [66.05180398174286]
We consider non-textitunknown yet safety-critical optimization problems under textitunknown yet safety-critical constraints.
Such problems naturally arise in a variety of domains including robotics, manufacturing, and medical procedures.
A crucial component of our analysis is to introduce and apply a technique called shrinkage in the context of safe optimization.
arXiv Detail & Related papers (2020-06-23T20:51:00Z) - Gradient-Variation Bound for Online Convex Optimization with Constraints [25.002868073267464]
We study online convex optimization with constraints consisting of multiple functional constraints and a relatively simple constraint set, such as a Euclidean ball.
First-order methods achieve an $mathcalO(sqrtT)$ regret and an $mathcalO(1)$ constraint violation, but do not take into account the structural information of the problem.
In this paper, we provide an emphinstance-dependent bound for online convex optimization with complex constraints obtained by a novel online primal-dual mirror-prox algorithm.
arXiv Detail & Related papers (2020-06-22T17:38:14Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.