Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional
Data Under Sampling Control
- URL: http://arxiv.org/abs/2009.11891v2
- Date: Sun, 10 Apr 2022 04:45:29 GMT
- Title: Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional
Data Under Sampling Control
- Authors: Wanrong Zhang, Yajun Mei
- Abstract summary: We propose to incorporate multi-armed bandit approaches into sequential change-point detection to develop an efficient bandit change-point detection algorithm.
Our proposed algorithm consists of two policies per time step: the adaptive sampling policy applies the Thompson Sampling algorithm to balance between exploration and exploitation for immediate reward gain, and the statistical decision policy fuses the local Shiryaev-Roberts-Pollak statistics to determine whether to raise a global alarm by sum shrinkage techniques.
- Score: 13.249453757295083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many real-world problems of real-time monitoring high-dimensional
streaming data, one wants to detect an undesired event or change quickly once
it occurs, but under the sampling control constraint in the sense that one
might be able to only observe or use selected components data for
decision-making per time step in the resource-constrained environments. In this
paper, we propose to incorporate multi-armed bandit approaches into sequential
change-point detection to develop an efficient bandit change-point detection
algorithm based on the limiting Bayesian approach to incorporate a prior
knowledge of potential changes. Our proposed algorithm, termed
Thompson-Sampling-Shiryaev-Roberts-Pollak (TSSRP), consists of two policies per
time step: the adaptive sampling policy applies the Thompson Sampling algorithm
to balance between exploration for acquiring long-term knowledge and
exploitation for immediate reward gain, and the statistical decision policy
fuses the local Shiryaev-Roberts-Pollak statistics to determine whether to
raise a global alarm by sum shrinkage techniques. Extensive numerical
simulations and case studies demonstrate the statistical and computational
efficiency of our proposed TSSRP algorithm.
Related papers
- Partially-Observable Sequential Change-Point Detection for Autocorrelated Data via Upper Confidence Region [12.645304808491309]
We propose a detection scheme called adaptive upper confidence region with state space model (AUCRSS) for sequential change point detection.
A partially-observable Kalman filter algorithm is developed for online inference of SSM, and accordingly, a change point detection scheme based on a generalized likelihood ratio test is analyzed.
arXiv Detail & Related papers (2024-03-30T02:32:53Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Adaptive Resources Allocation CUSUM for Binomial Count Data Monitoring
with Application to COVID-19 Hotspot Detection [11.954681424276528]
We present an efficient statistical method to robustly and efficiently detect the hotspot with limited sampling resources.
Our main idea is to combine the multi-arm bandit (MAB) and change-point detection methods.
This method is applied to hotspot detection in a real case study of county-level daily positive COVID-19 cases in Washington State WA.
arXiv Detail & Related papers (2022-08-09T21:26:28Z) - Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks [59.419152768018506]
We show that any optimal policy necessarily satisfies the k-SP constraint.
We propose a novel cost function that penalizes the policy violating SP constraint, instead of completely excluding it.
Our experiments on MiniGrid, DeepMind Lab, Atari, and Fetch show that the proposed method significantly improves proximal policy optimization (PPO)
arXiv Detail & Related papers (2021-07-13T21:39:21Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Learning with Safety Constraints: Sample Complexity of Reinforcement
Learning for Constrained MDPs [13.922754427601491]
We characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy.
Our main finding is that compared to the best known bounds of the unconstrained regime, the sample of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints.
arXiv Detail & Related papers (2020-08-01T18:17:08Z) - Faster Activity and Data Detection in Massive Random Access: A
Multi-armed Bandit Approach [30.292176932361528]
This paper investigates the grant-free random access with massive IoT devices.
By embedding the data symbols in the signature sequences, joint device activity detection and data decoding can be achieved.
arXiv Detail & Related papers (2020-01-28T10:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.