Related papers: Online Learning for Approximately-Convex Functions with Long-term Adversarial Constraints

Online Learning for Approximately-Convex Functions with Long-term Adversarial Constraints

URL: http://arxiv.org/abs/2508.16992v1
Date: Sat, 23 Aug 2025 11:21:24 GMT
Title: Online Learning for Approximately-Convex Functions with Long-term Adversarial Constraints
Authors: Dhruv Sarkar, Samrat Mukhopadhyay, Abhishek Sinha,
Abstract summary: We study an online learning problem with long-term budget constraints in the adversarial setting.<n>In this problem, at each round $t$, a learner selects an action from a convex set, after which an adversary reveals a cost function.<n>Our approach yields an efficient solution for the $textttAdversas$ problem with improved guarantees.
Score: 8.314956969778692
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We study an online learning problem with long-term budget constraints in the adversarial setting. In this problem, at each round $t$, the learner selects an action from a convex decision set, after which the adversary reveals a cost function $f_t$ and a resource consumption function $g_t$. The cost and consumption functions are assumed to be $\alpha$-approximately convex - a broad class that generalizes convexity and encompasses many common non-convex optimization problems, including DR-submodular maximization, Online Vertex Cover, and Regularized Phase Retrieval. The goal is to design an online algorithm that minimizes cumulative cost over a horizon of length $T$ while approximately satisfying a long-term budget constraint of $B_T$. We propose an efficient first-order online algorithm that guarantees $O(\sqrt{T})$ $\alpha$-regret against the optimal fixed feasible benchmark while consuming at most $O(B_T \log T)+ \tilde{O}(\sqrt{T})$ resources in both full-information and bandit feedback settings. In the bandit feedback setting, our approach yields an efficient solution for the $\texttt{Adversarial Bandits with Knapsacks}$ problem with improved guarantees. We also prove matching lower bounds, demonstrating the tightness of our results. Finally, we characterize the class of $\alpha$-approximately convex functions and show that our results apply to a broad family of problems.

Related papers

Beyond $\tilde{O}(\sqrt{T})$ Constraint Violation for Online Convex Optimization with Adversarial Constraints [16.99491218081617]
We revisit the Online Convex Optimization problem with adversarial constraints.<n>We propose an online policy that achieves $tildeO(sqrtdT+ Tbeta)$ regret and $tildeO(dT1-beta)$ CCV.
arXiv Detail & Related papers (2025-05-10T17:23:10Z)
Projection-free Algorithms for Online Convex Optimization with Adversarial Constraints [10.047668792033033]
We study a generalization of the Online Convex Optimization (OCO) framework with time-varying adversarial constraints.<n>In this problem, after selecting a feasible action from the convex decision set $X,$ a convex constraint function is revealed alongside the cost function in each round.<n>We propose a *projection-free* online policy which makes a single call to a Linear Program (LP) solver per round.
arXiv Detail & Related papers (2025-01-28T13:04:32Z)
An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints [55.2480439325792]
We study Online Convex Optimization (OCO) with adversarial constraints.<n>We focus on a setting where the algorithm has access to predictions of the loss and constraint functions.<n>Our results show that we can improve the current best bounds of $ O(sqrtT) $ regret and $ tildeO(sqrtT) $ cumulative constraint violations.
arXiv Detail & Related papers (2024-12-11T03:06:42Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Optimal Algorithms for Online Convex Optimization with Adversarial Constraints [16.99491218081617]
In COCO, a convex cost function and a convex constraint function are revealed to the learner after it chooses the action for that round. We show that an online policy can simultaneously achieve $O(sqrtT)$ regret and $tildeO(sqrtT)$ CCV without any restrictive assumptions.
arXiv Detail & Related papers (2023-10-29T09:55:41Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity. We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z)
Efficient Online-Bandit Strategies for Minimax Learning Problems [21.300877551771197]
Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or minimization with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the learning is carried out over the model parameters $winmathcalW$ and over the empirical distribution $pinmathcalK$ of the training set. To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm.
arXiv Detail & Related papers (2021-05-28T16:01:42Z)
Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation [92.3161051419884]
We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback. We show that it can achieve $tildeO(dHsqrtT)$ regret, where $H$ is the length of the episode. We also prove a matching lower bound of $tildeOmega(dHsqrtT)$ up to logarithmic factors.
arXiv Detail & Related papers (2021-02-17T18:54:08Z)
Lazy OCO: Online Convex Optimization on a Switching Budget [34.936641201844054]
We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds. Similar problems have been addressed in prior work for the discrete decision set setting, and more recently in the continuous setting but only with an adaptive adversary.
arXiv Detail & Related papers (2021-02-07T14:47:19Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.