Related papers: Efficient Adaptive Regret Minimization

Efficient Adaptive Regret Minimization

URL: http://arxiv.org/abs/2207.00646v1
Date: Fri, 1 Jul 2022 19:43:11 GMT
Title: Efficient Adaptive Regret Minimization
Authors: Zhou Lu, Elad Hazan
Abstract summary: In online convex optimization the player aims to minimize her regret against a fixed comparator over the entire repeated game. Existing adaptive regret algorithms suffer from a computational penalty - typically on the order of a multiplicative factor that grows logarithmically in the number of game iterations. In this paper we show how to reduce this computational penalty to be doubly logarithmic in the number of game iterations, and with minimal degradation to the optimal attainable adaptive regret bounds.
Score: 35.121567896321885
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In online convex optimization the player aims to minimize her regret against a fixed comparator over the entire repeated game. Algorithms that minimize standard regret may converge to a fixed decision, which is undesireable in changing or dynamic environments. This motivates the stronger metric of adaptive regret, or the maximum regret over any continuous sub-interval in time. Existing adaptive regret algorithms suffer from a computational penalty - typically on the order of a multiplicative factor that grows logarithmically in the number of game iterations. In this paper we show how to reduce this computational penalty to be doubly logarithmic in the number of game iterations, and with minimal degradation to the optimal attainable adaptive regret bounds.

Related papers

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret [10.700891331004799]
This paper considers the distributed online bandit optimization problem with non loss functions over a time-varying digraph. Players select an adversary and then the adversary assigns an arbitrary non-linear loss function to this player. The expected regret of our algorithms is comparable to existing algorithms that use two-point deviation.
arXiv Detail & Related papers (2024-09-24T02:37:33Z)
Online Convex Optimisation: The Optimal Switching Regret for all Segmentations Simultaneously [8.850922234275636]
A switching regret is defined relative to any segmentation of the trial sequence, and is equal to the sum of the static regrets of each segment. Our algorithm for doing so is very efficient: having a space and per-trial time complexity that is logarithmic in the time-horizon.
arXiv Detail & Related papers (2024-05-31T14:16:52Z)
Resetting the Optimizer in Deep RL: An Empirical Study [10.907980864371213]
We focus on the task of approximating the optimal value function in deep reinforcement learning. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.
arXiv Detail & Related papers (2023-06-30T17:53:50Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation [76.61911795703062]
We study the problem of regret when the learner is involved in a continuous game with other optimizing agents. In this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We propose a fully adaptive method that smoothly interpolates between worst- and best-case regret guarantees.
arXiv Detail & Related papers (2022-06-13T10:13:51Z)
Optimistic and Adaptive Lagrangian Hedging [11.698607733213226]
In online learning an algorithm plays against an environment with losses possibly picked by an adversary at each round. We introduce optimism and adaptive stepsizes to Lagrangian hedging, a class of online algorithms that includes regret-matching, and hedge.
arXiv Detail & Related papers (2021-01-23T23:32:40Z)
Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)
Dynamic Regret of Convex and Smooth Functions [93.71361250701075]
We investigate online convex optimization in non-stationary environments. We choose the dynamic regret as the performance measure. We show that it is possible to further enhance the dynamic regret by exploiting the smoothness condition.
arXiv Detail & Related papers (2020-07-07T14:10:57Z)
Minimizing Dynamic Regret and Adaptive Regret Simultaneously [60.17824125301273]
We propose novel online algorithms that are able to minimize the dynamic regret and adaptive regret simultaneously. Our theoretical guarantee is even stronger in the sense that one algorithm is able to minimize the dynamic regret over any interval.
arXiv Detail & Related papers (2020-02-06T03:32:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.