Related papers: Ensemble Sampling

Ensemble Sampling

URL: http://arxiv.org/abs/1705.07347v4
Date: Tue, 25 Apr 2023 05:33:47 GMT
Title: Ensemble Sampling
Authors: Xiuyuan Lu, Benjamin Van Roy
Abstract summary: This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. We establish a theoretical basis that supports the approach and present computational results that offer further insight.
Score: 18.85309520133554
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for simple special cases. This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. Ensemble sampling dramatically expands on the range of applications for which Thompson sampling is viable. We establish a theoretical basis that supports the approach and present computational results that offer further insight.

Related papers

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [81.34900892130929]
We explore inference compute as another axis for scaling, using the simple technique of repeatedly sampling candidate solutions from a model. Across multiple tasks and models, we observe that coverage scales with the number of samples over four orders of magnitude. In domains like coding and formal proofs, where answers can be automatically verified, these increases in coverage directly translate into improved performance.
arXiv Detail & Related papers (2024-07-31T17:57:25Z)
Improving Gradient-guided Nested Sampling for Posterior Inference [47.08481529384556]
We present a performant, general-purpose gradient-guided nested sampling algorithm, $tt GGNS$. We show the potential of combining nested sampling with generative flow networks to obtain large amounts of high-quality samples from the posterior distribution.
arXiv Detail & Related papers (2023-12-06T21:09:18Z)
On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms. We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z)
Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z)
An Analysis of Ensemble Sampling [28.18592417451813]
Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable. We establish a Bayesian regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem.
arXiv Detail & Related papers (2022-03-02T18:41:22Z)
Online Learning of Network Bottlenecks via Minimax Paths [6.316693022958221]
We study bottleneck identification in networks via extracting minimax paths. We then devise an alternative problem formulation which approximates the original objective. We experimentally evaluate the performance of Thompson Sampling with the approximate formulation on real-world directed and undirected networks.
arXiv Detail & Related papers (2021-09-17T11:11:50Z)
Diffusion Approximations for Thompson Sampling [4.390757904176221]
We show that the dynamics of Thompson sampling evolve according to discrete versions of SDE's horizon and ODE's. Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem.
arXiv Detail & Related papers (2021-05-19T16:28:01Z)
Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits [28.504921333436833]
We propose a variant of a Thompson sampling based algorithm that adaptively re-weighs the terms of a doubly robust estimator on the true mean reward of each arm. The proposed algorithm matches the optimal (minimax) regret rate and its empirical evaluation in a semi-synthetic experiment. We extend this approach to contextual bandits, where there are more sources of bias present apart from the adaptive data collection.
arXiv Detail & Related papers (2021-02-25T22:29:25Z)
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring. We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z)
On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice. It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z)
Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling. We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.