Contextual Continuum Bandits: Static Versus Dynamic Regret
- URL: http://arxiv.org/abs/2406.05714v2
- Date: Thu, 20 Jun 2024 16:47:04 GMT
- Title: Contextual Continuum Bandits: Static Versus Dynamic Regret
- Authors: Arya Akhavan, Karim Lounici, Massimiliano Pontil, Alexandre B. Tsybakov,
- Abstract summary: We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set.
We demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear dynamic regret.
Inspired by the interior point method and employing self-concordant barriers, we propose an algorithm achieving a sub-linear dynamic regret.
- Score: 70.71582850199871
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated to the context. The goal is to minimize all the underlying functions for the received contexts, leading to a dynamic (contextual) notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are H\"older with respect to the contexts, we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear dynamic regret. We further study the case of strongly convex and smooth functions when the observations are noisy. Inspired by the interior point method and employing self-concordant barriers, we propose an algorithm achieving a sub-linear dynamic regret. Lastly, we present a minimax lower bound, implying two key facts. First, no algorithm can achieve sub-linear dynamic regret over functions that are not continuous with respect to the context. Second, for strongly convex and smooth functions, the algorithm that we propose achieves, up to a logarithmic factor, the minimax optimal rate of dynamic regret as a function of the number of queries.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Dynamic Regret of Online Mirror Descent for Relatively Smooth Convex
Cost Functions [30.412826613009518]
We show that it is possible to bound the dynamic regret, even when neither Lipschitz continuity nor uniform smoothness is present.
We then show that with an additional condition of relatively strong convexity, the dynamic regret can be bounded by the path length and gradient variation.
arXiv Detail & Related papers (2022-02-25T17:35:07Z) - Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities.
Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z) - Minimax Optimization: The Case of Convex-Submodular [50.03984152441271]
Minimax problems extend beyond the continuous domain to mixed continuous-discrete domains or even fully discrete domains.
We introduce the class of convex-submodular minimax problems, where the objective is convex with respect to the continuous variable and submodular with respect to the discrete variable.
Our proposed algorithms are iterative and combine tools from both discrete and continuous optimization.
arXiv Detail & Related papers (2021-11-01T21:06:35Z) - Exact Asymptotics for Linear Quadratic Adaptive Control [6.287145010885044]
We study the simplest non-bandit reinforcement learning problem: linear quadratic control (LQAC)
We derive expressions for the regret, estimation error, and prediction error of a stepwise-updating LQAC algorithm.
In simulations on both stable and unstable systems, we find that our theory also describes the algorithm's finite-sample behavior remarkably well.
arXiv Detail & Related papers (2020-11-02T22:43:30Z) - Dynamic Regret of Convex and Smooth Functions [93.71361250701075]
We investigate online convex optimization in non-stationary environments.
We choose the dynamic regret as the performance measure.
We show that it is possible to further enhance the dynamic regret by exploiting the smoothness condition.
arXiv Detail & Related papers (2020-07-07T14:10:57Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.