Non-Stationary Bandit Learning via Predictive Sampling
- URL: http://arxiv.org/abs/2205.01970v7
- Date: Thu, 29 Aug 2024 01:38:57 GMT
- Title: Non-Stationary Bandit Learning via Predictive Sampling
- Authors: Yueyang Liu, Xu Kuang, Benjamin Van Roy,
- Abstract summary: We show that Thompson sampling can perform poorly in non-stationary environments.
We propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness.
- Score: 15.88678122212934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to non-stationary environments. We attribute such failures to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to non-stationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. A theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulations, we demonstrate that predictive sampling outperforms Thompson sampling in all non-stationary environments examined.
Related papers
- Which Pretrain Samples to Rehearse when Finetuning Pretrained Models? [60.59376487151964]
Fine-tuning pretrained models on specific tasks is now the de facto approach for text and vision tasks.
A known pitfall of this approach is the forgetting of pretraining knowledge that happens during finetuning.
We propose a novel sampling scheme, mix-cd, that identifies and prioritizes samples that actually face forgetting.
arXiv Detail & Related papers (2024-02-12T22:32:12Z) - VITS : Variational Inference Thompson Sampling for contextual bandits [10.028119153832346]
We introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits.
We propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference.
We show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit.
arXiv Detail & Related papers (2023-07-19T17:53:22Z) - Accounting For Informative Sampling When Learning to Forecast Treatment
Outcomes Over Time [66.08455276899578]
We show that informative sampling can prohibit accurate estimation of treatment outcomes if not properly accounted for.
We present a general framework for learning treatment outcomes in the presence of informative sampling using inverse intensity-weighting.
We propose a novel method, TESAR-CDE, that instantiates this framework using Neural CDEs.
arXiv Detail & Related papers (2023-06-07T08:51:06Z) - ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference [54.17205151960878]
We introduce a sampling-free approach that is generic and easy to deploy.
We produce reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost.
arXiv Detail & Related papers (2022-11-21T13:23:09Z) - An Analysis of Ensemble Sampling [28.18592417451813]
Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable.
We establish a Bayesian regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem.
arXiv Detail & Related papers (2022-03-02T18:41:22Z) - AutoSampling: Search for Effective Data Sampling Schedules [118.20014773014671]
We propose an AutoSampling method to automatically learn sampling schedules for model training.
We apply our method to a variety of image classification tasks illustrating the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-05-28T09:39:41Z) - Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits [28.504921333436833]
We propose a variant of a Thompson sampling based algorithm that adaptively re-weighs the terms of a doubly robust estimator on the true mean reward of each arm.
The proposed algorithm matches the optimal (minimax) regret rate and its empirical evaluation in a semi-synthetic experiment.
We extend this approach to contextual bandits, where there are more sources of bias present apart from the adaptive data collection.
arXiv Detail & Related papers (2021-02-25T22:29:25Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.