A Change-Detection Based Thompson Sampling Framework for Non-Stationary
Bandits
- URL: http://arxiv.org/abs/2009.02791v1
- Date: Sun, 6 Sep 2020 18:12:25 GMT
- Title: A Change-Detection Based Thompson Sampling Framework for Non-Stationary
Bandits
- Authors: Gourab Ghatak
- Abstract summary: We consider a non-stationary two-armed bandit framework and propose a change-detection based Thompson sampling algorithm.
The proposed strategy compares the empirical mean of the recent rewards of an arm with the estimate of the mean of the rewards from its history.
We validate the efficacy of TS-CD by testing it for edge-control of radio access technique (RAT)-selection in a wireless network.
- Score: 7.012710335689297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a non-stationary two-armed bandit framework and propose a
change-detection based Thompson sampling (TS) algorithm, named TS with
change-detection (TS-CD), to keep track of the dynamic environment. The
non-stationarity is modeled using a Poisson arrival process, which changes the
mean of the rewards on each arrival. The proposed strategy compares the
empirical mean of the recent rewards of an arm with the estimate of the mean of
the rewards from its history. It detects a change when the empirical mean
deviates from the mean estimate by a value larger than a threshold. Then, we
characterize the lower bound on the duration of the time-window for which the
bandit framework must remain stationary for TS-CD to successfully detect a
change when it occurs. Consequently, our results highlight an upper bound on
the parameter for the Poisson arrival process, for which the TS-CD achieves
asymptotic regret optimality with high probability. Finally, we validate the
efficacy of TS-CD by testing it for edge-control of radio access technique
(RAT)-selection in a wireless network. Our results show that TS-CD not only
outperforms the classical max-power RAT selection strategy but also other
actively adaptive and passively adaptive bandit algorithms that are designed
for non-stationary environments.
Related papers
- UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation [93.38604803625294]
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG)
We use Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks.
UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-03T17:39:38Z) - VITS : Variational Inference Thompson Sampling for contextual bandits [10.028119153832346]
We introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits.
We propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference.
We show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit.
arXiv Detail & Related papers (2023-07-19T17:53:22Z) - Guiding Pseudo-labels with Uncertainty Estimation for Test-Time
Adaptation [27.233704767025174]
Test-Time Adaptation (TTA) is a specific case of Unsupervised Domain Adaptation (UDA) where a model is adapted to a target domain without access to source data.
We propose a novel approach for the TTA setting based on a loss reweighting strategy that brings robustness against the noise that inevitably affects the pseudo-labels.
arXiv Detail & Related papers (2023-03-07T10:04:55Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - Optimality of Thompson Sampling with Noninformative Priors for Pareto
Bandits [81.45853204922795]
Thompson sampling has been shown to achieve problem-dependent lower bounds in several reward models.
We consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters.
We find that TS with the Jeffreys and reference priors can achieve the lower bound if one uses a truncation procedure.
arXiv Detail & Related papers (2023-02-03T04:47:14Z) - Adaptive Resources Allocation CUSUM for Binomial Count Data Monitoring
with Application to COVID-19 Hotspot Detection [11.954681424276528]
We present an efficient statistical method to robustly and efficiently detect the hotspot with limited sampling resources.
Our main idea is to combine the multi-arm bandit (MAB) and change-point detection methods.
This method is applied to hotspot detection in a real case study of county-level daily positive COVID-19 cases in Washington State WA.
arXiv Detail & Related papers (2022-08-09T21:26:28Z) - Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear
Quadratic Control [85.22735611954694]
We study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using Thompson Sampling (TS)
We propose an efficient TS algorithm for the adaptive control of LQRs, TSAC, that attains $tilde O(sqrtT)$ regret, even for multidimensional systems.
arXiv Detail & Related papers (2022-06-17T02:47:53Z) - Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for
Non-Stationary Bandits [2.879036956042183]
We consider the non-stationary multi-armed bandit (MAB) framework and propose a Kolmogorov-Smirnov (KS) test based Thompson Sampling (TS) algorithm named TS-KS.
In particular, for the two-armed bandit case, we derive bounds on the number of samples of the reward distribution to detect the change once it occurs.
Our results show that the proposed TS-KS algorithm outperforms not only the static TS algorithm but also it performs better than other bandit algorithms designed for non-stationary environments.
arXiv Detail & Related papers (2021-05-30T17:28:41Z) - Comparing Probability Distributions with Conditional Transport [63.11403041984197]
We propose conditional transport (CT) as a new divergence and approximate it with the amortized CT (ACT) cost.
ACT amortizes the computation of its conditional transport plans and comes with unbiased sample gradients that are straightforward to compute.
On a wide variety of benchmark datasets generative modeling, substituting the default statistical distance of an existing generative adversarial network with ACT is shown to consistently improve the performance.
arXiv Detail & Related papers (2020-12-28T05:14:22Z) - Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional
Data Under Sampling Control [13.249453757295083]
We propose to incorporate multi-armed bandit approaches into sequential change-point detection to develop an efficient bandit change-point detection algorithm.
Our proposed algorithm consists of two policies per time step: the adaptive sampling policy applies the Thompson Sampling algorithm to balance between exploration and exploitation for immediate reward gain, and the statistical decision policy fuses the local Shiryaev-Roberts-Pollak statistics to determine whether to raise a global alarm by sum shrinkage techniques.
arXiv Detail & Related papers (2020-09-24T18:30:55Z) - SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector.
It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection.
Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.