Related papers: Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

URL: http://arxiv.org/abs/2006.06613v2
Date: Sun, 3 Jan 2021 15:20:05 GMT
Title: Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
Authors: Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko
Abstract summary: We investigate multi-armed bandit with semi-bandit feedback (CMAB) We analyze variants of the Combinatorial Thompson Sampling policy (CTS) This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB)
Score: 56.31950477139053
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family. We propose to answer the above question for these two families by analyzing variants of the Combinatorial Thompson Sampling policy (CTS). For mutually independent outcomes in $[0,1]$, we propose a tight analysis of CTS using Beta priors. We then look at the more general setting of multivariate sub-Gaussian outcomes and propose a tight analysis of CTS using Gaussian priors. This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB), which, although optimal, is not computationally efficient.

Related papers

Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leiblier Maillard Sampling [24.487235945761913]
We study the problem of Multi-Armed Bandits (MAB) with reward distributions belonging to a One- Exponential Distribution (OPED) family. In this paper, we design an algorithm, Exponential Kullback-Leibler Maillard Sampling (abbrev. expklms), that can achieve multiple optimality criteria simultaneously.
arXiv Detail & Related papers (2025-02-20T09:12:16Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms [14.33758865948252]
We revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS) We prove that MED is optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known.
arXiv Detail & Related papers (2023-03-10T16:43:48Z)
Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms [59.8188496313214]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z)
SPRT-based Efficient Best Arm Identification in Stochastic Bandits [31.359578768463752]
This paper investigates the best arm identification problem in multi-armed bandits in the fixed confidence setting. Existing algorithms for the exponential family of bandits face computational challenges. A framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing.
arXiv Detail & Related papers (2022-07-22T15:54:53Z)
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z)
Bayesian decision-making under misspecified priors with applications to meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits. We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z)
SGD with shuffling: optimal rates without component convexity and large epoch requirements [60.65928290219793]
We consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once) We establish minimax optimal convergence rates of these algorithms up to poly-log factor gaps. We further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts.
arXiv Detail & Related papers (2020-06-12T05:00:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.