Scalable Online Change Detection for High-dimensional Data Streams
- URL: http://arxiv.org/abs/2205.12706v1
- Date: Wed, 25 May 2022 12:02:59 GMT
- Title: Scalable Online Change Detection for High-dimensional Data Streams
- Authors: Florian Kalinke, Marco Heyden, Edouard Fouch\'e, Klemens B\"ohm
- Abstract summary: Mean Discrepancy Adaptive Windowing (MMDAW)
Our algorithm is a general-purpose non-parametric change detector.
Experiments show that MMDAW achieves better detection quality than state-of-the-art competitors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Detecting changes in data streams is a core objective in their analysis and
has applications in, say, predictive maintenance, fraud detection, and
medicine. A principled approach to detect changes is to compare distributions
observed within the stream to each other. However, data streams often are
high-dimensional, and changes can be complex, e.g., only manifest themselves in
higher moments. The streaming setting also imposes heavy memory and computation
restrictions. We propose an algorithm, Maximum Mean Discrepancy Adaptive
Windowing (MMDAW), which leverages the well-known Maximum Mean Discrepancy
(MMD) two-sample test, and facilitates its efficient online computation on
windows whose size it flexibly adapts. As MMD is sensitive to any change in the
underlying distribution, our algorithm is a general-purpose non-parametric
change detector that fulfills the requirements imposed by the streaming
setting. Our experiments show that MMDAW achieves better detection quality than
state-of-the-art competitors.
Related papers
- Reproduction of scan B-statistic for kernel change-point detection algorithm [10.49860279555873]
Change-point detection has garnered significant attention due to its broad range of applications.
In this paper, we reproduce a recently proposed online change-point detection algorithm based on an efficient kernel-based scan B-statistic.
Our numerical experiments demonstrate that the scan B-statistic consistently delivers superior performance.
arXiv Detail & Related papers (2024-08-23T15:12:31Z) - MTSCI: A Conditional Diffusion Model for Multivariate Time Series Consistent Imputation [41.681869408967586]
Key research question is how to ensure imputation consistency, i.e., intra-consistency between observed and imputed values.
Previous methods rely solely on the inductive bias of the imputation targets to guide the learning process.
arXiv Detail & Related papers (2024-08-11T10:24:53Z) - Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment [59.75420353684495]
Machine learning applications on signals such as computer vision or biomedical data often face challenges due to the variability that exists across hardware devices or session recordings.
In this work, we propose Spatio-Temporal Monge Alignment (STMA) to mitigate these variabilities.
We show that STMA leads to significant and consistent performance gains between datasets acquired with very different settings.
arXiv Detail & Related papers (2024-07-19T13:33:38Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Partial identification of kernel based two sample tests with mismeasured
data [5.076419064097733]
Two-sample tests such as the Maximum Mean Discrepancy (MMD) are often used to detect differences between two distributions in machine learning applications.
We study the estimation of the MMD under $epsilon$-contamination, where a possibly non-random $epsilon$ proportion of one distribution is erroneously grouped with the other.
We propose a method to estimate these bounds, and show that it gives estimates that converge to the sharpest possible bounds on the MMD as sample size increases.
arXiv Detail & Related papers (2023-08-07T13:21:58Z) - FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support.
The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG)
Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z) - Implicit Regularization Properties of Variance Reduced Stochastic Mirror
Descent [7.00422423634143]
We prove that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression.
We derive a model estimation accuracy result in the setting when the true model is sparse.
arXiv Detail & Related papers (2022-04-29T19:37:24Z) - E-detectors: a nonparametric framework for sequential change detection [86.15115654324488]
We develop a fundamentally new and general framework for sequential change detection.
Our procedures come with clean, nonasymptotic bounds on the average run length.
We show how to design their mixtures in order to achieve both statistical and computational efficiency.
arXiv Detail & Related papers (2022-03-07T17:25:02Z) - PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and
Localization [64.39761523935613]
We present a new framework for Patch Distribution Modeling, PaDiM, to concurrently detect and localize anomalies in images.
PaDiM makes use of a pretrained convolutional neural network (CNN) for patch embedding.
It also exploits correlations between the different semantic levels of CNN to better localize anomalies.
arXiv Detail & Related papers (2020-11-17T17:29:18Z) - Real-Time Anomaly Detection in Edge Streams [49.26098240310257]
We propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges.
We further propose MIDAS-F, to solve the problem by which anomalies are incorporated into the algorithm's internal states.
Experiments show that MIDAS-F has significantly higher accuracy than MIDAS.
arXiv Detail & Related papers (2020-09-17T17:59:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.