A Hybrid Framework for Real-Time Data Drift and Anomaly Identification Using Hierarchical Temporal Memory and Statistical Tests
- URL: http://arxiv.org/abs/2504.18599v1
- Date: Thu, 24 Apr 2025 18:23:18 GMT
- Title: A Hybrid Framework for Real-Time Data Drift and Anomaly Identification Using Hierarchical Temporal Memory and Statistical Tests
- Authors: Subhadip Bandyopadhyay, Joy Bose, Sujoy Roy Chowdhury,
- Abstract summary: This paper proposes a novel hybrid framework combiningHierarchical Temporal Memory (HTM) and Sequential Probability Ratio Test (SPRT) for real-time data drift detection and anomaly identification.<n> Experimental evaluations demonstrate that the proposed method outperforms conventional drift detection techniques like the Kolmogorov-Smirnov (KS) test, Wasserstein distance, and Population Stability Index (PSI) in terms of accuracy, adaptability, and computational efficiency.
- Score: 14.37149160708975
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data Drift is the phenomenon where the generating model behind the data changes over time. Due to data drift, any model built on the past training data becomes less relevant and inaccurate over time. Thus, detecting and controlling for data drift is critical in machine learning models. Hierarchical Temporal Memory (HTM) is a machine learning model developed by Jeff Hawkins, inspired by how the human brain processes information. It is a biologically inspired model of memory that is similar in structure to the neocortex, and whose performance is claimed to be comparable to state of the art models in detecting anomalies in time series data. Another unique benefit of HTMs is its independence from training and testing cycle; all the learning takes place online with streaming data and no separate training and testing cycle is required. In sequential learning paradigm, Sequential Probability Ratio Test (SPRT) offers some unique benefit for online learning and inference. This paper proposes a novel hybrid framework combining HTM and SPRT for real-time data drift detection and anomaly identification. Unlike existing data drift methods, our approach eliminates frequent retraining and ensures low false positive rates. HTMs currently work with one dimensional or univariate data. In a second study, we also propose an application of HTM in multidimensional supervised scenario for anomaly detection by combining the outputs of multiple HTM columns, one for each dimension of the data, through a neural network. Experimental evaluations demonstrate that the proposed method outperforms conventional drift detection techniques like the Kolmogorov-Smirnov (KS) test, Wasserstein distance, and Population Stability Index (PSI) in terms of accuracy, adaptability, and computational efficiency. Our experiments also provide insights into optimizing hyperparameters for real-time deployment in domains such as Telecom.
Related papers
- Sparse identification of nonlinear dynamics and Koopman operators with Shallow Recurrent Decoder Networks [3.1484174280822845]
We present a method to jointly solve the sensing and model identification problems with simple implementation, efficient, and robust performance.<n>SINDy-SHRED uses Gated Recurrent Units to model sparse sensor measurements along with a shallow network decoder to reconstruct the full-temporal field from the latent state space.<n>We conduct systematic experimental studies on PDE data such as turbulent flows, real-world sensor measurements for sea surface temperature, and direct video data.
arXiv Detail & Related papers (2025-01-23T02:18:13Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - An Automated Machine Learning Approach for Detecting Anomalous Peak
Patterns in Time Series Data from a Research Watershed in the Northeastern
United States Critical Zone [3.1747517745997014]
This paper presents an automated machine learning framework designed to assist hydrologists in detecting anomalies in time series data generated by sensors in a research watershed in the northeastern United States critical zone.
The framework specifically focuses on identifying peak-pattern anomalies, which may arise from sensor malfunctions or natural phenomena.
arXiv Detail & Related papers (2023-09-14T19:07:50Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Online Evolutionary Neural Architecture Search for Multivariate
Non-Stationary Time Series Forecasting [72.89994745876086]
This work presents the Online Neuro-Evolution-based Neural Architecture Search (ONE-NAS) algorithm.
ONE-NAS is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks.
Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods.
arXiv Detail & Related papers (2023-02-20T22:25:47Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Identifying nonlinear dynamical systems from multi-modal time series
data [3.721528851694675]
Empirically observed time series in physics, biology, or medicine are commonly generated by some underlying dynamical system (DS)
There is an increasing interest to harvest machine learning methods to reconstruct this latent DS in a completely data-driven, unsupervised way.
Here we propose a general framework for multi-modal data integration for the purpose of nonlinear DS identification and cross-modal prediction.
arXiv Detail & Related papers (2021-11-04T14:59:28Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z) - Computer Model Calibration with Time Series Data using Deep Learning and
Quantile Regression [1.6758573326215689]
The existing standard calibration framework suffers from inferential issues when the model output and observational data are high-dimensional dependent data.
We propose a new calibration framework based on a deep neural network (DNN) with long-short term memory layers that directly emulates the inverse relationship between the model output and input parameters.
arXiv Detail & Related papers (2020-08-29T22:18:41Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.