Statistical Efficiency of Thompson Sampling for Combinatorial
Semi-Bandits
- URL: http://arxiv.org/abs/2006.06613v2
- Date: Sun, 3 Jan 2021 15:20:05 GMT
- Title: Statistical Efficiency of Thompson Sampling for Combinatorial
Semi-Bandits
- Authors: Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko
- Abstract summary: We investigate multi-armed bandit with semi-bandit feedback (CMAB)
We analyze variants of the Combinatorial Thompson Sampling policy (CTS)
This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB)
- Score: 56.31950477139053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate stochastic combinatorial multi-armed bandit with semi-bandit
feedback (CMAB). In CMAB, the question of the existence of an efficient policy
with an optimal asymptotic regret (up to a factor poly-logarithmic with the
action size) is still open for many families of distributions, including
mutually independent outcomes, and more generally the multivariate sub-Gaussian
family. We propose to answer the above question for these two families by
analyzing variants of the Combinatorial Thompson Sampling policy (CTS). For
mutually independent outcomes in $[0,1]$, we propose a tight analysis of CTS
using Beta priors. We then look at the more general setting of multivariate
sub-Gaussian outcomes and propose a tight analysis of CTS using Gaussian
priors. This last result gives us an alternative to the Efficient Sampling for
Combinatorial Bandit policy (ESCB), which, although optimal, is not
computationally efficient.
Related papers
- Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms [14.33758865948252]
We revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS)
We prove that MED is optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known.
arXiv Detail & Related papers (2023-03-10T16:43:48Z) - Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms [59.8188496313214]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound.
First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals.
Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z) - SPRT-based Efficient Best Arm Identification in Stochastic Bandits [31.359578768463752]
This paper investigates the best arm identification problem in multi-armed bandits in the fixed confidence setting.
Existing algorithms for the exponential family of bandits face computational challenges.
A framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing.
arXiv Detail & Related papers (2022-07-22T15:54:53Z) - Algorithms for Adaptive Experiments that Trade-off Statistical Analysis
with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments.
We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - SGD with shuffling: optimal rates without component convexity and large
epoch requirements [60.65928290219793]
We consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once)
We establish minimax optimal convergence rates of these algorithms up to poly-log factor gaps.
We further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts.
arXiv Detail & Related papers (2020-06-12T05:00:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.