Related papers: A Comprehensive Forecasting-Based Framework for Time Series Anomaly Detection: Benchmarking on the Numenta Anomaly Benchmark (NAB)

A Comprehensive Forecasting-Based Framework for Time Series Anomaly Detection: Benchmarking on the Numenta Anomaly Benchmark (NAB)

URL: http://arxiv.org/abs/2510.11141v1
Date: Mon, 13 Oct 2025 08:31:42 GMT
Title: A Comprehensive Forecasting-Based Framework for Time Series Anomaly Detection: Benchmarking on the Numenta Anomaly Benchmark (NAB)
Authors: Mohammad Karami, Mostafa Jalali, Fatemeh Ghassemi,
Abstract summary: Time series anomaly detection is critical for modern digital infrastructures.<n>We present a forecasting-based framework unifying classical methods with deep learning architectures.<n>We conduct the first complete evaluation on the Numenta Anomaly Benchmark.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Time series anomaly detection is critical for modern digital infrastructures, yet existing methods lack systematic cross-domain evaluation. We present a comprehensive forecasting-based framework unifying classical methods (Holt-Winters, SARIMA) with deep learning architectures (LSTM, Informer) under a common residual-based detection interface. Our modular pipeline integrates preprocessing (normalization, STL decomposition), four forecasting models, four detection methods, and dual evaluation through forecasting metrics (MAE, RMSE, PCC) and detection metrics (Precision, Recall, F1, AUC). We conduct the first complete evaluation on the Numenta Anomaly Benchmark (58 datasets, 7 categories) with 232 model training runs and 464 detection evaluations achieving 100\% success rate. LSTM achieves best performance (F1: 0.688, ranking first or second on 81\% of datasets) with exceptional correlation on complex patterns (PCC: 0.999). Informer provides competitive accuracy (F1: 0.683) with 30\% faster training. Classical methods achieve perfect predictions on simple synthetic data with 60 lower cost but show 2-3 worse F1-scores on real-world datasets. Forecasting quality dominates detection performance: differences between detection methods (F1: 0.621-0.688) are smaller than between forecasting models (F1: 0.344-0.688). Our findings provide evidence-based guidance: use LSTM for complex patterns, Informer for efficiency-critical deployments, and classical methods for simple periodic data with resource constraints. The complete implementation and results establish baselines for future forecasting-based anomaly detection research.

Related papers

A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations [21.49782595218257]
Irregular time series with missing values present significant challenges for predictive modeling in domains such as healthcare.<n>Our method computes four key features per variable-mean and standard deviation of observed values, as well as the mean and variability of changes between consecutive observations.<n>Our approach achieves state-of-the-art performance, surpassing recent transformer and graph-based models by 0.5-1.7% in AUROC/AUPRC and 1.1-1.7% in accuracy/F1-score.
arXiv Detail & Related papers (2026-02-23T05:48:17Z)
NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation [58.30936615525824]
We present NAIPv2, a debiased and efficient framework for paper quality estimation.<n> NAIPv2 employs pairwise learning within domain-year groups to reduce inconsistencies in reviewer ratings.<n>It is trained on pairwise comparisons but enabling efficient pointwise prediction at deployment.
arXiv Detail & Related papers (2025-09-29T17:59:23Z)
Revisiting Multivariate Time Series Forecasting with Missing Values [74.56971641937771]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z)
A Realistic Evaluation of Cross-Frequency Transfer Learning and Foundation Forecasting Models [32.56983347493999]
Cross-frequency transfer learning (CFTL) has emerged as a popular framework for curating large-scale time series datasets to pre-train foundation forecasting models (FFMs)<n>Although CFTL has shown promise, current benchmarking practices fall short of accurately assessing its performance.<n>This shortcoming stems from many factors: an over-reliance on small-scale evaluation datasets; inadequate treatment of sample size when computing summary statistics; reporting of suboptimal statistical models; and failing to account for non-negligible risks of overlap between pre-training and test datasets.
arXiv Detail & Related papers (2025-09-23T18:19:50Z)
MAWIFlow Benchmark: Realistic Flow-Based Evaluation for Network Intrusion Detection [47.86433139298671]
This paper introduces MAWIFlow, a flow-based benchmark derived from the MAWILAB v1.1 dataset.<n>The resulting datasets comprise temporally distinct samples from January 2011, 2016, and 2021, drawn from trans-Pacific backbone traffic.<n>Traditional machine learning methods, including Decision Trees, Random Forests, XGBoost, and Logistic Regression, are compared to a deep learning model based on a CNN-BiLSTM architecture.
arXiv Detail & Related papers (2025-06-20T14:51:35Z)
Graph-Based Fault Diagnosis for Rotating Machinery: Adaptive Segmentation and Structural Feature Integration [0.0]
This paper proposes a graph-based framework for robust and interpretable multiclass fault diagnosis in rotating machinery.<n>It integrates entropy-optimized signal segmentation, time-frequency feature extraction, and graph-theoretic modeling to transform vibration signals into structured representations.<n>The proposed method achieves high diagnostic accuracy when evaluated on two benchmark datasets.
arXiv Detail & Related papers (2025-04-29T13:34:52Z)
Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data [39.40116554523575]
We present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network. It learns to approximate Bayesian inference on synthetic datasets drawn from a prior. It improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while maintaining stronger calibration.
arXiv Detail & Related papers (2024-11-15T23:49:23Z)
A Mixture of Exemplars Approach for Efficient Out-of-Distribution Detection with Foundation Models [0.0]
This paper presents an efficient approach to tackling OOD detection that is designed to maximise the benefit of training a backbone with a high quality, frozen, pretrained foundation model.<n>MoLAR provides strong OOD detection performance when only comparing the similarity of OOD examples to the exemplars, a small set of images chosen to be representative of the dataset.
arXiv Detail & Related papers (2023-11-28T06:12:28Z)
Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM) CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance. Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
RLAD: Time Series Anomaly Detection through Reinforcement Learning and Active Learning [17.089402177923297]
We introduce a new semi-supervised, time series anomaly detection algorithm. It uses deep reinforcement learning and active learning to efficiently learn and adapt to anomalies in real-world time series data. It requires no manual tuning of parameters and outperforms all state-of-art methods we compare with.
arXiv Detail & Related papers (2021-03-31T15:21:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.