Related papers: Online Convex Optimization with Continuous Switching Constraint

Online Convex Optimization with Continuous Switching Constraint

URL: http://arxiv.org/abs/2103.11370v1
Date: Sun, 21 Mar 2021 11:43:35 GMT
Title: Online Convex Optimization with Continuous Switching Constraint
Authors: Guanghui Wang, Yuanyu Wan, Tianbao Yang, Lijun Zhang
Abstract summary: We introduce the problem of online convex optimization with continuous switching constraint. We show that, for strongly convex functions, the regret bound can be improved to $O(log T)$ for $S=Omega(log T)$, and $O(minT/exp(S)+S,T)$ for $S=O(log T)$.
Score: 78.25064451417082
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many sequential decision making applications, the change of decision would bring an additional cost, such as the wear-and-tear cost associated with changing server status. To control the switching cost, we introduce the problem of online convex optimization with continuous switching constraint, where the goal is to achieve a small regret given a budget on the \emph{overall} switching cost. We first investigate the hardness of the problem, and provide a lower bound of order $\Omega(\sqrt{T})$ when the switching cost budget $S=\Omega(\sqrt{T})$, and $\Omega(\min\{\frac{T}{S},T\})$ when $S=O(\sqrt{T})$, where $T$ is the time horizon. The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to the cumulative switching cost of the player incurred so far based on the orthogonal technique. We then develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound. Finally, we show that, for strongly convex functions, the regret bound can be improved to $O(\log T)$ for $S=\Omega(\log T)$, and $O(\min\{T/\exp(S)+S,T\})$ for $S=O(\log T)$.

Related papers

Online Inverse Linear Optimization: Improved Regret Bound, Robustness to Suboptimality, and Toward Tight Regret Analysis [25.50155563108198]
We study an online learning problem where, over $T$ rounds, a learner observes both time-varying sets of feasible actions and an agent's optimal actions. We obtain an $O(nln T)$ regret bound, improving upon the previous bound of $O(n4ln T)$ by a factor of $n3$.
arXiv Detail & Related papers (2025-01-24T09:19:15Z)
An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints [55.2480439325792]
We study Online Convex Optimization (OCO) with adversarial constraints. We focus on a setting where the algorithm has access to predictions of the loss and constraint functions. Our results show that we can improve the current best bounds of $ O(sqrtT) $ regret and $ tildeO(sqrtT) $ cumulative constraint violations.
arXiv Detail & Related papers (2024-12-11T03:06:42Z)
Online Convex Optimization with a Separation Oracle [10.225358400539719]
We introduce a new projection-free algorithm for Online Convex Optimization (OCO) with a state-of-the-art regret guarantee. Our algorithm achieves a regret bound of $widetildeO(sqrtdT + kappa d)$, while requiring only $widetildeO(1) calls to a separation oracle per round.
arXiv Detail & Related papers (2024-10-03T13:35:08Z)
Improved Dynamic Regret for Online Frank-Wolfe [54.690867216880356]
We investigate the dynamic regret of online Frank-Wolfe (OFW), which is an efficient projection-free algorithm for online convex optimization. In this paper, we derive improved dynamic regret bounds for OFW by extending the fast convergence rates of FW from offline optimization to online optimization.
arXiv Detail & Related papers (2023-02-11T07:19:51Z)
Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost [31.04961854943877]
We propose a new algorithm that achieves a regret of $widetildeO(sqrtH4S2AT)$ while requiring a switching cost of $O(HSA loglog T)$. As a byproduct, our new algorithmic techniques allow us to derive a emphreward-free exploration algorithm with an optimal switching cost of $O(HSA)$.
arXiv Detail & Related papers (2022-02-13T19:01:06Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z)
An Algorithm for Stochastic and Adversarial Bandits with Switching Costs [10.549307055348596]
We propose an algorithm for adversarial multiarmed bandits with switching costs, where the algorithm pays a price $lambda$ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon.
arXiv Detail & Related papers (2021-02-19T11:03:51Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
Revisiting Smoothed Online Learning [70.09792747315323]
We investigate the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost. To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the greedy algorithm which simply minimizes the weighted sum of the hitting cost and the switching cost.
arXiv Detail & Related papers (2021-02-13T14:15:55Z)
Lazy OCO: Online Convex Optimization on a Switching Budget [34.936641201844054]
We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds. Similar problems have been addressed in prior work for the discrete decision set setting, and more recently in the continuous setting but only with an adaptive adversary.
arXiv Detail & Related papers (2021-02-07T14:47:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.