Non-Stationary Bandit Learning via Predictive Sampling
- URL: http://arxiv.org/abs/2205.01970v7
- Date: Thu, 29 Aug 2024 01:38:57 GMT
- Title: Non-Stationary Bandit Learning via Predictive Sampling
- Authors: Yueyang Liu, Xu Kuang, Benjamin Van Roy,
- Abstract summary: We show that Thompson sampling can perform poorly in non-stationary environments.
We propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness.
- Score: 15.88678122212934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to non-stationary environments. We attribute such failures to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to non-stationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. A theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulations, we demonstrate that predictive sampling outperforms Thompson sampling in all non-stationary environments examined.
Related papers
- Optimal Algorithms for Augmented Testing of Discrete Distributions [25.818433126197036]
We show that a predictor can indeed reduce the number of samples required for all three property testing tasks.
A key advantage of our algorithms is their adaptability to the precision of the prediction.
We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal.
arXiv Detail & Related papers (2024-12-01T21:31:22Z) - Accounting For Informative Sampling When Learning to Forecast Treatment
Outcomes Over Time [66.08455276899578]
We show that informative sampling can prohibit accurate estimation of treatment outcomes if not properly accounted for.
We present a general framework for learning treatment outcomes in the presence of informative sampling using inverse intensity-weighting.
We propose a novel method, TESAR-CDE, that instantiates this framework using Neural CDEs.
arXiv Detail & Related papers (2023-06-07T08:51:06Z) - ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference [54.17205151960878]
We introduce a sampling-free approach that is generic and easy to deploy.
We produce reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost.
arXiv Detail & Related papers (2022-11-21T13:23:09Z) - Rethinking Collaborative Metric Learning: Toward an Efficient
Alternative without Negative Sampling [156.7248383178991]
Collaborative Metric Learning (CML) paradigm has aroused wide interest in the area of recommendation systems (RS)
We find that negative sampling would lead to a biased estimation of the generalization error.
Motivated by this, we propose an efficient alternative without negative sampling for CML named textitSampling-Free Collaborative Metric Learning (SFCML)
arXiv Detail & Related papers (2022-06-23T08:50:22Z) - An Analysis of Ensemble Sampling [28.18592417451813]
Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable.
We establish a Bayesian regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem.
arXiv Detail & Related papers (2022-03-02T18:41:22Z) - AutoSampling: Search for Effective Data Sampling Schedules [118.20014773014671]
We propose an AutoSampling method to automatically learn sampling schedules for model training.
We apply our method to a variety of image classification tasks illustrating the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-05-28T09:39:41Z) - Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits [28.504921333436833]
We propose a variant of a Thompson sampling based algorithm that adaptively re-weighs the terms of a doubly robust estimator on the true mean reward of each arm.
The proposed algorithm matches the optimal (minimax) regret rate and its empirical evaluation in a semi-synthetic experiment.
We extend this approach to contextual bandits, where there are more sources of bias present apart from the adaptive data collection.
arXiv Detail & Related papers (2021-02-25T22:29:25Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.