Related papers: Lazy OCO: Online Convex Optimization on a Switching Budget

Lazy OCO: Online Convex Optimization on a Switching Budget

URL: http://arxiv.org/abs/2102.03803v7
Date: Sun, 17 Sep 2023 07:59:11 GMT
Title: Lazy OCO: Online Convex Optimization on a Switching Budget
Authors: Uri Sherman, Tomer Koren
Abstract summary: We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds. Similar problems have been addressed in prior work for the discrete decision set setting, and more recently in the continuous setting but only with an adaptive adversary.
Score: 34.936641201844054
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds. Similar problems have been addressed in prior work for the discrete decision set setting, and more recently in the continuous setting but only with an adaptive adversary. In this work, we aim to fill the gap and present computationally efficient algorithms in the more prevalent oblivious setting, establishing a regret bound of $O(T/S)$ for general convex losses and $\widetilde O(T/S^2)$ for strongly convex losses. In addition, for stochastic i.i.d.~losses, we present a simple algorithm that performs $\log T$ switches with only a multiplicative $\log T$ factor overhead in its regret in both the general and strongly convex settings. Finally, we complement our algorithms with lower bounds that match our upper bounds in some of the cases we consider.

Related papers

Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback [49.84060509296641]
We study online finite-horizon Markov Decision Processes with adversarially changing loss and aggregate bandit feedback (a.k.a full-bandit) Under this type of feedback, the agent observes only the total loss incurred over the entire trajectory, rather than the individual losses at each intermediate step within the trajectory. We introduce the first Policy Optimization algorithms for this setting.
arXiv Detail & Related papers (2025-02-06T12:03:24Z)
Online Convex Optimization with a Separation Oracle [10.225358400539719]
We introduce a new projection-free algorithm for Online Convex Optimization (OCO) with a state-of-the-art regret guarantee. Our algorithm achieves a regret bound of $widetildeO(sqrtdT + kappa d)$, while requiring only $widetildeO(1) calls to a separation oracle per round.
arXiv Detail & Related papers (2024-10-03T13:35:08Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Second Order Methods for Bandit Optimization and Control [34.51425758864638]
We show that our algorithm achieves optimal (in terms of terms of convex functions that we call $kappa$-2020) regret bounds for a large class of convex functions. We also investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory.
arXiv Detail & Related papers (2024-02-14T04:03:38Z)
Projection-free Online Exp-concave Optimization [21.30065439295409]
We present an LOO-based ONS-style algorithm, which using overall $O(T)$ calls to a LOO, guarantees in worst case regret bounded by $widetildeO(n2/3T2/3)$. Our algorithm is most interesting in an important and plausible low-dimensional data scenario.
arXiv Detail & Related papers (2023-02-09T18:58:05Z)
Iterative Hard Thresholding with Adaptive Regularization: Sparser Solutions Without Sacrificing Runtime [17.60502131429094]
We propose a simple modification to the iterative hard thresholding algorithm, which recoversally sparser solutions as a function of the condition number. Our proposed algorithm, regularized IHT, returns a solution with sparsity $O(skappa)$. Our algorithm significantly improves over ARHT which also finds a solution of sparsity $O(skappa)$.
arXiv Detail & Related papers (2022-04-11T19:33:15Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
Online Convex Optimization with Continuous Switching Constraint [78.25064451417082]
We introduce the problem of online convex optimization with continuous switching constraint. We show that, for strongly convex functions, the regret bound can be improved to $O(log T)$ for $S=Omega(log T)$, and $O(minT/exp(S)+S,T)$ for $S=O(log T)$.
arXiv Detail & Related papers (2021-03-21T11:43:35Z)
Revisiting Smoothed Online Learning [70.09792747315323]
We investigate the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost. To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the greedy algorithm which simply minimizes the weighted sum of the hitting cost and the switching cost.
arXiv Detail & Related papers (2021-02-13T14:15:55Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Projection-free Online Learning over Strongly Convex Sets [24.517908972536432]
We study the special case of online learning over strongly convex sets, for which we first prove that OFW can enjoy a better regret bound of $O(T2/3)$ for general convex losses. We show that it achieves a regret bound of $O(sqrtT)$ over general convex sets and a better regret bound of $O(sqrtT)$ over strongly convex sets.
arXiv Detail & Related papers (2020-10-16T05:42:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.