Ensemble Sampling
- URL: http://arxiv.org/abs/1705.07347v4
- Date: Tue, 25 Apr 2023 05:33:47 GMT
- Title: Ensemble Sampling
- Authors: Xiuyuan Lu, Benjamin Van Roy
- Abstract summary: This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks.
We establish a theoretical basis that supports the approach and present computational results that offer further insight.
- Score: 18.85309520133554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thompson sampling has emerged as an effective heuristic for a broad range of
online decision problems. In its basic form, the algorithm requires computing
and sampling from a posterior distribution over models, which is tractable only
for simple special cases. This paper develops ensemble sampling, which aims to
approximate Thompson sampling while maintaining tractability even in the face
of complex models such as neural networks. Ensemble sampling dramatically
expands on the range of applications for which Thompson sampling is viable. We
establish a theoretical basis that supports the approach and present
computational results that offer further insight.
Related papers
- Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [81.34900892130929]
We explore inference compute as another axis for scaling, using the simple technique of repeatedly sampling candidate solutions from a model.
Across multiple tasks and models, we observe that coverage scales with the number of samples over four orders of magnitude.
In domains like coding and formal proofs, where answers can be automatically verified, these increases in coverage directly translate into improved performance.
arXiv Detail & Related papers (2024-07-31T17:57:25Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - An Analysis of Ensemble Sampling [28.18592417451813]
Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable.
We establish a Bayesian regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem.
arXiv Detail & Related papers (2022-03-02T18:41:22Z) - Online Learning of Network Bottlenecks via Minimax Paths [6.316693022958221]
We study bottleneck identification in networks via extracting minimax paths.
We then devise an alternative problem formulation which approximates the original objective.
We experimentally evaluate the performance of Thompson Sampling with the approximate formulation on real-world directed and undirected networks.
arXiv Detail & Related papers (2021-09-17T11:11:50Z) - Diffusion Approximations for Thompson Sampling [4.390757904176221]
We show that the dynamics of Thompson sampling evolve according to discrete versions of SDE's horizon and ODE's.
Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem.
arXiv Detail & Related papers (2021-05-19T16:28:01Z) - Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits [28.504921333436833]
We propose a variant of a Thompson sampling based algorithm that adaptively re-weighs the terms of a doubly robust estimator on the true mean reward of each arm.
The proposed algorithm matches the optimal (minimax) regret rate and its empirical evaluation in a semi-synthetic experiment.
We extend this approach to contextual bandits, where there are more sources of bias present apart from the adaptive data collection.
arXiv Detail & Related papers (2021-02-25T22:29:25Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice.
It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration.
We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z) - Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling.
We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.