Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive
Regret Bounds
- URL: http://arxiv.org/abs/2302.12370v1
- Date: Fri, 24 Feb 2023 00:02:24 GMT
- Title: Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive
Regret Bounds
- Authors: Shinji Ito, Kei Takemura
- Abstract summary: The paper proposes a linear bandit algorithm that is adaptive to environments at two different levels of hierarchy.
At the higher level, the proposed algorithm adapts to a variety of types of environments.
The proposed algorithm has data-dependent regret bounds that depend on all of the cumulative loss for the optimal action.
- Score: 27.92755687977196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a linear bandit algorithm that is adaptive to
environments at two different levels of hierarchy. At the higher level, the
proposed algorithm adapts to a variety of types of environments. More
precisely, it achieves best-of-three-worlds regret bounds, i.e., of
${O}(\sqrt{T \log T})$ for adversarial environments and of $O(\frac{\log
T}{\Delta_{\min}} + \sqrt{\frac{C \log T}{\Delta_{\min}}})$ for stochastic
environments with adversarial corruptions, where $T$, $\Delta_{\min}$, and $C$
denote, respectively, the time horizon, the minimum sub-optimality gap, and the
total amount of the corruption. Note that polynomial factors in the
dimensionality are omitted here. At the lower level, in each of the adversarial
and stochastic regimes, the proposed algorithm adapts to certain environmental
characteristics, thereby performing better. The proposed algorithm has
data-dependent regret bounds that depend on all of the cumulative loss for the
optimal action, the total quadratic variation, and the path-length of the loss
vector sequence. In addition, for stochastic environments, the proposed
algorithm has a variance-adaptive regret bound of $O(\frac{\sigma^2 \log
T}{\Delta_{\min}})$ as well, where $\sigma^2$ denotes the maximum variance of
the feedback loss. The proposed algorithm is based on the SCRiBLe algorithm. By
incorporating into this a new technique we call scaled-up sampling, we obtain
high-level adaptability, and by incorporating the technique of optimistic
online learning, we obtain low-level adaptability.
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits.
Our algorithms deliver near-optimal regret bounds in both the adversarial and adversarial regimes.
arXiv Detail & Related papers (2023-12-24T08:27:30Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary
Dueling Bandits [27.279654173896372]
We study the problem of emphdynamic regret minimization in $K$-armed Dueling Bandits under non-stationary or time varying preferences.
This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary win-loss' feedback for this pair.
arXiv Detail & Related papers (2021-11-06T16:46:55Z) - Conservative Stochastic Optimization with Expectation Constraints [11.393603788068777]
This paper considers convex optimization problems where the objective and constraint functions involve expectations with respect to the data indices or environmental variables.
Online and efficient approaches for solving such problems have not been widely studied.
We propose a novel conservative optimization algorithm (CSOA) that achieves zero constraint violation and $Oleft(T-frac12right)$ optimality gap.
arXiv Detail & Related papers (2020-08-13T08:56:24Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Robust estimation via generalized quasi-gradients [28.292300073453877]
We show why many recently proposed robust estimation problems are efficiently solvable.
We identify the existence of "generalized quasi-gradients"
We show that generalized quasi-gradients exist and construct efficient algorithms.
arXiv Detail & Related papers (2020-05-28T15:14:33Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.