No Regrets for Learning the Prior in Bandits
- URL: http://arxiv.org/abs/2107.06196v1
- Date: Tue, 13 Jul 2021 15:51:32 GMT
- Title: No Regrets for Learning the Prior in Bandits
- Authors: Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesv\'ari
- Abstract summary: $tt AdaTS$ is a Thompson sampling algorithm that adapts sequentially to bandit tasks.
$tt AdaTS$ is a fully-Bayesian algorithm that can be implemented efficiently in several classes of bandit problems.
- Score: 30.478188057004175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts
sequentially to bandit tasks that it interacts with. The key idea in ${\tt
AdaTS}$ is to adapt to an unknown task prior distribution by maintaining a
distribution over its parameters. When solving a bandit task, that uncertainty
is marginalized out and properly accounted for. ${\tt AdaTS}$ is a
fully-Bayesian algorithm that can be implemented efficiently in several classes
of bandit problems. We derive upper bounds on its Bayes regret that quantify
the loss due to not knowing the task prior, and show that it is small. Our
theory is supported by experiments, where ${\tt AdaTS}$ outperforms prior
algorithms and works well even in challenging real-world problems.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed
Rewards [40.99322897009357]
We propose two novel algorithms based on truncation and mean of medians.
Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches.
Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
arXiv Detail & Related papers (2023-10-28T13:01:10Z) - Bandit Algorithms for Prophet Inequality and Pandora's Box [13.709418181148148]
We study the Prophet Inequality and Pandora's Box problems in the Multi-Armed Bandits model.
Our results give near-optimal $tildeO(mathsfpoly(n)sqrtT)$ total regret algorithms for both Prophet Inequality and Pandora's Box.
arXiv Detail & Related papers (2022-11-16T00:10:35Z) - An Asymptotically Optimal Batched Algorithm for the Dueling Bandit
Problem [13.69077222007053]
We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons.
We obtain regret of $O(K2log(K)) + O(Klog(T))$ in $O(log(T))$ rounds, where $T$ is the time horizon.
In computational experiments over a variety of real-world datasets, we observe that our algorithm using $O(log(T))$ rounds achieves almost the same performance as fully
arXiv Detail & Related papers (2022-09-25T00:23:55Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - Non-stationary Bandits and Meta-Learning with a Small Set of Optimal
Arms [30.024167992890916]
We study a decision where a learner faces a sequence of $K$-armed bandit tasks.
The adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms.
The boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting)
arXiv Detail & Related papers (2022-02-25T22:28:01Z) - Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits
with Super Heavy-Tailed Payoffs [27.636407641546914]
We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians.
We show that the regret bound is near-optimal even with very heavy-tailed noise.
arXiv Detail & Related papers (2021-10-26T17:30:44Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic
Gradient Descent and Thompson Sampling [83.48992319018147]
We consider the contextual bandit problem, where a player sequentially makes decisions based on past observations to maximize the cumulative reward.
A natural way to resolve this problem is to apply online gradient descent (SGD) so that the per-step time and memory complexity can be reduced to constant.
In this work, we show that online SGD can be applied to the generalized linear bandit problem.
The proposed SGD-TS algorithm, which uses a single-step SGD update to exploit past information, achieves $tildeO(sqrtT)$ regret with the total time complexity that
arXiv Detail & Related papers (2020-06-07T01:12:39Z) - Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial
Rewards [59.559028338399855]
We consider the problem of sleeping bandits with action sets and adversarial rewards.
In this paper, we provide a new computationally efficient algorithm inspired by EXP3 satisfying a regret of order $O(sqrtT)$.
arXiv Detail & Related papers (2020-04-14T00:41:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.