Bandit algorithms: Letting go of logarithmic regret for statistical
robustness
- URL: http://arxiv.org/abs/2006.12038v1
- Date: Mon, 22 Jun 2020 07:18:47 GMT
- Title: Bandit algorithms: Letting go of logarithmic regret for statistical
robustness
- Authors: Kumar Ashutosh (IIT Bombay), Jayakrishnan Nair (IIT Bombay), Anmol
Kagrecha (IIT Bombay), Krishna Jagannathan (IIT Madras)
- Abstract summary: We study regret in a multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness.
We show that bandit learning algorithms with logarithmic regret are always inconsistent and that consistent learning algorithms always suffer a super-logarithmic regret.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study regret minimization in a stochastic multi-armed bandit setting and
establish a fundamental trade-off between the regret suffered under an
algorithm, and its statistical robustness. Considering broad classes of
underlying arms' distributions, we show that bandit learning algorithms with
logarithmic regret are always inconsistent and that consistent learning
algorithms always suffer a super-logarithmic regret. This result highlights the
inevitable statistical fragility of all `logarithmic regret' bandit algorithms
available in the literature---for instance, if a UCB algorithm designed for
$\sigma$-subGaussian distributions is used in a subGaussian setting with a
mismatched variance parameter, the learning performance could be inconsistent.
Next, we show a positive result: statistically robust and consistent learning
performance is attainable if we allow the regret to be slightly worse than
logarithmic. Specifically, we propose three classes of distribution oblivious
algorithms that achieve an asymptotic regret that is arbitrarily close to
logarithmic.
Related papers
- Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit [8.087699764574788]
We study the performance guarantees of exploration-free greedy algorithms for the linear contextual bandit problem.
We introduce a novel condition, named the textitLocal Anti-Concentration (LAC) condition, which enables a greedy bandit algorithm to achieve provable efficiency.
arXiv Detail & Related papers (2024-11-19T21:53:06Z) - Semi-Bandit Learning for Monotone Stochastic Optimization [20.776114616154242]
We provide a generic online learning algorithm for a class of "monotone" problems.
Our framework applies to several fundamental problems in optimization such as prophet, Pandora's box knapsack, inequality matchings and submodular optimization.
arXiv Detail & Related papers (2023-12-24T07:46:37Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits
with Super Heavy-Tailed Payoffs [27.636407641546914]
We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians.
We show that the regret bound is near-optimal even with very heavy-tailed noise.
arXiv Detail & Related papers (2021-10-26T17:30:44Z) - Extreme Bandits using Robust Statistics [12.6543086847761]
We consider a multi-armed bandit problem motivated by situations where only the extreme values, as opposed to expected values in the classical bandit setting, are of interest.
We propose distribution free algorithms using robust statistics and characterize the statistical properties.
arXiv Detail & Related papers (2021-09-09T17:24:15Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z) - Efficient improper learning for online logistic regression [68.8204255655161]
It is known that any proper algorithm which has logarithmic regret in the number of samples (denoted n) necessarily suffers an exponential multiplicative constant in B.
In this work, we design an efficient improper algorithm that avoids this exponential constant while preserving a logarithmic regret.
Our new algorithm based on regularized empirical risk minimization with surrogate losses satisfies a regret scaling as O(B log(Bn)) with a per-round time-complexity of order O(d2)
arXiv Detail & Related papers (2020-03-18T09:16:14Z) - Adversarial Online Learning with Changing Action Sets: Efficient
Algorithms with Approximate Regret Bounds [48.312484940846]
We revisit the problem of online learning with sleeping experts/bandits.
In each time step, only a subset of the actions are available for the algorithm to choose from.
We give an algorithm that provides a no-approximate-regret guarantee for the general sleeping expert/bandit problems.
arXiv Detail & Related papers (2020-03-07T02:13:21Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.