Related papers: A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits

A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits

URL: http://arxiv.org/abs/2009.02791v1
Date: Sun, 6 Sep 2020 18:12:25 GMT
Title: A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits
Authors: Gourab Ghatak
Abstract summary: We consider a non-stationary two-armed bandit framework and propose a change-detection based Thompson sampling algorithm. The proposed strategy compares the empirical mean of the recent rewards of an arm with the estimate of the mean of the rewards from its history. We validate the efficacy of TS-CD by testing it for edge-control of radio access technique (RAT)-selection in a wireless network.
Score: 7.012710335689297
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a non-stationary two-armed bandit framework and propose a change-detection based Thompson sampling (TS) algorithm, named TS with change-detection (TS-CD), to keep track of the dynamic environment. The non-stationarity is modeled using a Poisson arrival process, which changes the mean of the rewards on each arrival. The proposed strategy compares the empirical mean of the recent rewards of an arm with the estimate of the mean of the rewards from its history. It detects a change when the empirical mean deviates from the mean estimate by a value larger than a threshold. Then, we characterize the lower bound on the duration of the time-window for which the bandit framework must remain stationary for TS-CD to successfully detect a change when it occurs. Consequently, our results highlight an upper bound on the parameter for the Poisson arrival process, for which the TS-CD achieves asymptotic regret optimality with high probability. Finally, we validate the efficacy of TS-CD by testing it for edge-control of radio access technique (RAT)-selection in a wireless network. Our results show that TS-CD not only outperforms the classical max-power RAT selection strategy but also other actively adaptive and passively adaptive bandit algorithms that are designed for non-stationary environments.

Related papers

UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation [93.38604803625294]
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) We use Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks. UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-03T17:39:38Z)
VITS : Variational Inference Thompson Sampling for contextual bandits [10.028119153832346]
We introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits. We propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference. We show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit.
arXiv Detail & Related papers (2023-07-19T17:53:22Z)
Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation [27.233704767025174]
Test-Time Adaptation (TTA) is a specific case of Unsupervised Domain Adaptation (UDA) where a model is adapted to a target domain without access to source data. We propose a novel approach for the TTA setting based on a loss reweighting strategy that brings robustness against the noise that inevitably affects the pseudo-labels.
arXiv Detail & Related papers (2023-03-07T10:04:55Z)
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience. We propose the first online continuous hyperparameter tuning framework for contextual bandits. We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z)
Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits [81.45853204922795]
Thompson sampling has been shown to achieve problem-dependent lower bounds in several reward models. We consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters. We find that TS with the Jeffreys and reference priors can achieve the lower bound if one uses a truncation procedure.
arXiv Detail & Related papers (2023-02-03T04:47:14Z)
Adaptive Resources Allocation CUSUM for Binomial Count Data Monitoring with Application to COVID-19 Hotspot Detection [11.954681424276528]
We present an efficient statistical method to robustly and efficiently detect the hotspot with limited sampling resources. Our main idea is to combine the multi-arm bandit (MAB) and change-point detection methods. This method is applied to hotspot detection in a real case study of county-level daily positive COVID-19 cases in Washington State WA.
arXiv Detail & Related papers (2022-08-09T21:26:28Z)
Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear Quadratic Control [85.22735611954694]
We study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using Thompson Sampling (TS) We propose an efficient TS algorithm for the adaptive control of LQRs, TSAC, that attains $tilde O(sqrtT)$ regret, even for multidimensional systems.
arXiv Detail & Related papers (2022-06-17T02:47:53Z)
Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits [2.879036956042183]
We consider the non-stationary multi-armed bandit (MAB) framework and propose a Kolmogorov-Smirnov (KS) test based Thompson Sampling (TS) algorithm named TS-KS. In particular, for the two-armed bandit case, we derive bounds on the number of samples of the reward distribution to detect the change once it occurs. Our results show that the proposed TS-KS algorithm outperforms not only the static TS algorithm but also it performs better than other bandit algorithms designed for non-stationary environments.
arXiv Detail & Related papers (2021-05-30T17:28:41Z)
Comparing Probability Distributions with Conditional Transport [63.11403041984197]
We propose conditional transport (CT) as a new divergence and approximate it with the amortized CT (ACT) cost. ACT amortizes the computation of its conditional transport plans and comes with unbiased sample gradients that are straightforward to compute. On a wide variety of benchmark datasets generative modeling, substituting the default statistical distance of an existing generative adversarial network with ACT is shown to consistently improve the performance.
arXiv Detail & Related papers (2020-12-28T05:14:22Z)
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control [13.249453757295083]
We propose to incorporate multi-armed bandit approaches into sequential change-point detection to develop an efficient bandit change-point detection algorithm. Our proposed algorithm consists of two policies per time step: the adaptive sampling policy applies the Thompson Sampling algorithm to balance between exploration and exploitation for immediate reward gain, and the statistical decision policy fuses the local Shiryaev-Roberts-Pollak statistics to determine whether to raise a global alarm by sum shrinkage techniques.
arXiv Detail & Related papers (2020-09-24T18:30:55Z)
SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector. It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection. Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.