Related papers: Scalable Online Change Detection for High-dimensional Data Streams

Scalable Online Change Detection for High-dimensional Data Streams

URL: http://arxiv.org/abs/2205.12706v1
Date: Wed, 25 May 2022 12:02:59 GMT
Title: Scalable Online Change Detection for High-dimensional Data Streams
Authors: Florian Kalinke, Marco Heyden, Edouard Fouch\'e, Klemens B\"ohm
Abstract summary: Mean Discrepancy Adaptive Windowing (MMDAW) Our algorithm is a general-purpose non-parametric change detector. Experiments show that MMDAW achieves better detection quality than state-of-the-art competitors.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Detecting changes in data streams is a core objective in their analysis and has applications in, say, predictive maintenance, fraud detection, and medicine. A principled approach to detect changes is to compare distributions observed within the stream to each other. However, data streams often are high-dimensional, and changes can be complex, e.g., only manifest themselves in higher moments. The streaming setting also imposes heavy memory and computation restrictions. We propose an algorithm, Maximum Mean Discrepancy Adaptive Windowing (MMDAW), which leverages the well-known Maximum Mean Discrepancy (MMD) two-sample test, and facilitates its efficient online computation on windows whose size it flexibly adapts. As MMD is sensitive to any change in the underlying distribution, our algorithm is a general-purpose non-parametric change detector that fulfills the requirements imposed by the streaming setting. Our experiments show that MMDAW achieves better detection quality than state-of-the-art competitors.

Related papers

A Sample Efficient Conditional Independence Test in the Presence of Discretization [54.047334792855345]
Conditional Independence (CI) tests directly to discretized data can lead to incorrect conclusions.<n>Recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data.<n>Motivated by this, this paper introduces a sample-efficient CI test that does not rely on the binarization process.
arXiv Detail & Related papers (2025-06-10T12:41:26Z)
Signature Maximum Mean Discrepancy Two-Sample Statistical Tests [0.5461938536945723]
This work is dedicated to understanding the possibilities and challenges associated with applying the sig-MMD as a statistical tool in practice.<n>We introduce and explain the sig-MMD, and provide easily accessible and verifiable examples for its practical use.
arXiv Detail & Related papers (2025-06-02T14:26:58Z)
MMD-Newton Method for Multi-objective Optimization [3.8926796690238694]
We propose using MMD to solve continuous multi-objective optimization problems (MOPs)<n>We devise a novel set-oriented, MMD-based Newton (MMDN) method.<n>We empirically test the hybrid algorithm on 11 widely used benchmark problems.
arXiv Detail & Related papers (2025-05-20T16:56:50Z)
An Efficient Permutation-Based Kernel Two-Sample Test [13.229867216847534]
Two-sample hypothesis testing is a fundamental problem in statistics and machine learning. In this work, we use a Nystr"om approximation of the maximum mean discrepancy (MMD) to design a computationally efficient and practical testing algorithm.
arXiv Detail & Related papers (2025-02-19T09:22:48Z)
Reproduction of scan B-statistic for kernel change-point detection algorithm [10.49860279555873]
Change-point detection has garnered significant attention due to its broad range of applications. In this paper, we reproduce a recently proposed online change-point detection algorithm based on an efficient kernel-based scan B-statistic. Our numerical experiments demonstrate that the scan B-statistic consistently delivers superior performance.
arXiv Detail & Related papers (2024-08-23T15:12:31Z)
MTSCI: A Conditional Diffusion Model for Multivariate Time Series Consistent Imputation [41.681869408967586]
Key research question is how to ensure imputation consistency, i.e., intra-consistency between observed and imputed values. Previous methods rely solely on the inductive bias of the imputation targets to guide the learning process.
arXiv Detail & Related papers (2024-08-11T10:24:53Z)
Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment [59.75420353684495]
Machine learning applications on signals such as computer vision or biomedical data often face challenges due to the variability that exists across hardware devices or session recordings. In this work, we propose Spatio-Temporal Monge Alignment (STMA) to mitigate these variabilities. We show that STMA leads to significant and consistent performance gains between datasets acquired with very different settings.
arXiv Detail & Related papers (2024-07-19T13:33:38Z)
Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference. Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z)
Partial identification of kernel based two sample tests with mismeasured data [5.076419064097733]
Two-sample tests such as the Maximum Mean Discrepancy (MMD) are often used to detect differences between two distributions in machine learning applications. We study the estimation of the MMD under $epsilon$-contamination, where a possibly non-random $epsilon$ proportion of one distribution is erroneously grouped with the other. We propose a method to estimate these bounds, and show that it gives estimates that converge to the sharpest possible bounds on the MMD as sample size increases.
arXiv Detail & Related papers (2023-08-07T13:21:58Z)
FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support. The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG) Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z)
Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent [7.00422423634143]
We prove that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression. We derive a model estimation accuracy result in the setting when the true model is sparse.
arXiv Detail & Related papers (2022-04-29T19:37:24Z)
E-detectors: a nonparametric framework for sequential change detection [86.15115654324488]
We develop a fundamentally new and general framework for sequential change detection. Our procedures come with clean, nonasymptotic bounds on the average run length. We show how to design their mixtures in order to achieve both statistical and computational efficiency.
arXiv Detail & Related papers (2022-03-07T17:25:02Z)
PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization [64.39761523935613]
We present a new framework for Patch Distribution Modeling, PaDiM, to concurrently detect and localize anomalies in images. PaDiM makes use of a pretrained convolutional neural network (CNN) for patch embedding. It also exploits correlations between the different semantic levels of CNN to better localize anomalies.
arXiv Detail & Related papers (2020-11-17T17:29:18Z)
Real-Time Anomaly Detection in Edge Streams [49.26098240310257]
We propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges. We further propose MIDAS-F, to solve the problem by which anomalies are incorporated into the algorithm's internal states. Experiments show that MIDAS-F has significantly higher accuracy than MIDAS.
arXiv Detail & Related papers (2020-09-17T17:59:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.