Optimistic and Adaptive Lagrangian Hedging
- URL: http://arxiv.org/abs/2101.09603v2
- Date: Wed, 3 Feb 2021 02:59:24 GMT
- Title: Optimistic and Adaptive Lagrangian Hedging
- Authors: Ryan D'Orazio and Ruitong Huang
- Abstract summary: In online learning an algorithm plays against an environment with losses possibly picked by an adversary at each round.
We introduce optimism and adaptive stepsizes to Lagrangian hedging, a class of online algorithms that includes regret-matching, and hedge.
- Score: 11.698607733213226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In online learning an algorithm plays against an environment with losses
possibly picked by an adversary at each round. The generality of this framework
includes problems that are not adversarial, for example offline optimization,
or saddle point problems (i.e. min max optimization). However, online
algorithms are typically not designed to leverage additional structure present
in non-adversarial problems. Recently, slight modifications to well-known
online algorithms such as optimism and adaptive step sizes have been used in
several domains to accelerate online learning -- recovering optimal rates in
offline smooth optimization, and accelerating convergence to saddle points or
social welfare in smooth games. In this work we introduce optimism and adaptive
stepsizes to Lagrangian hedging, a class of online algorithms that includes
regret-matching, and hedge (i.e. multiplicative weights). Our results include:
a general general regret bound; a path length regret bound for a fixed smooth
loss, applicable to an optimistic variant of regret-matching and
regret-matching+; optimistic regret bounds for $\Phi$ regret, a framework that
includes external, internal, and swap regret; and optimistic bounds for a
family of algorithms that includes regret-matching+ as a special case.
Related papers
- Online Convex Optimisation: The Optimal Switching Regret for all Segmentations Simultaneously [8.850922234275636]
A switching regret is defined relative to any segmentation of the trial sequence, and is equal to the sum of the static regrets of each segment.
Our algorithm for doing so is very efficient: having a space and per-trial time complexity that is logarithmic in the time-horizon.
arXiv Detail & Related papers (2024-05-31T14:16:52Z) - A Generalized Approach to Online Convex Optimization [33.38582292895673]
We show that any algorithm for online linear optimization with fully adaptive adversaries is an algorithm for online convex optimization.
We show that any such algorithm that requires full-information feedback may be transformed to an algorithm with semi-bandit feedback with comparable regret bound.
arXiv Detail & Related papers (2024-02-13T17:42:27Z) - Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Distributed Online Non-convex Optimization with Composite Regret [31.53784277195043]
We propose a novel composite regret with a new network regret for distributed online general bounds losses.
To our knowledge, is the first regret bound for distributed online nonlinear learning.
arXiv Detail & Related papers (2022-09-21T04:16:33Z) - Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities.
Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z) - Efficient and Optimal Algorithms for Contextual Dueling Bandits under
Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other.
We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z) - Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence.
Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z) - Online Learning with Imperfect Hints [72.4277628722419]
We develop algorithms and nearly matching lower bounds for online learning with imperfect directional hints.
Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case.
arXiv Detail & Related papers (2020-02-11T23:06:09Z) - Minimizing Dynamic Regret and Adaptive Regret Simultaneously [60.17824125301273]
We propose novel online algorithms that are able to minimize the dynamic regret and adaptive regret simultaneously.
Our theoretical guarantee is even stronger in the sense that one algorithm is able to minimize the dynamic regret over any interval.
arXiv Detail & Related papers (2020-02-06T03:32:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.