Segmentation over Complexity: Evaluating Ensemble and Hybrid Approaches for Anomaly Detection in Industrial Time Series
- URL: http://arxiv.org/abs/2510.26159v1
- Date: Thu, 30 Oct 2025 05:39:44 GMT
- Title: Segmentation over Complexity: Evaluating Ensemble and Hybrid Approaches for Anomaly Detection in Industrial Time Series
- Authors: Emilio Mastriani, Alessandro Costa, Federico Incardona, Kevin Munari, Sebastiano Spinello,
- Abstract summary: We evaluate the impact of change point-derived statistical features, clustering-based substructure representations, and hybrid learning strategies on detection performance.<n>The ensemble achieved an AUC-ROC of 0.976, F1-score of 0.41, and 100% early detection within the defined time window.
- Score: 36.94429692322632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we investigate the effectiveness of advanced feature engineering and hybrid model architectures for anomaly detection in a multivariate industrial time series, focusing on a steam turbine system. We evaluate the impact of change point-derived statistical features, clustering-based substructure representations, and hybrid learning strategies on detection performance. Despite their theoretical appeal, these complex approaches consistently underperformed compared to a simple Random Forest + XGBoost ensemble trained on segmented data. The ensemble achieved an AUC-ROC of 0.976, F1-score of 0.41, and 100% early detection within the defined time window. Our findings highlight that, in scenarios with highly imbalanced and temporally uncertain data, model simplicity combined with optimized segmentation can outperform more sophisticated architectures, offering greater robustness, interpretability, and operational utility.
Related papers
- FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis [92.23551599659186]
Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology.<n>FusAD is a unified analysis framework designed for diverse time series tasks.
arXiv Detail & Related papers (2025-12-16T04:34:27Z) - Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z) - Localized Kernel Projection Outlyingness: A Two-Stage Approach for Multi-Modal Outlier Detection [0.0]
Two-Stage LKPLO is a novel multi-stage outlier detection framework.<n>It overcomes the coexisting limitations of conventional projection-based methods.<n>It achieves state-of-the-art performance on challenging datasets.
arXiv Detail & Related papers (2025-10-28T03:53:46Z) - Improving Anomaly Detection in Industrial Time Series: The Role of Segmentation and Heterogeneous Ensemble [36.94429692322632]
We show how the integration of segmentation techniques, combined with a heterogeneous ensemble, can enhance anomaly detection in an industrial production context.<n>We improve the AUC-ROC metric from 0.8599 (achieved with a PCA and LSTM ensemble) to 0.9760 (achieved with Random Forest and XGBoost)<n>In our future work, we intend to assess the benefit of introducing weighted features derived from the study of change points, combined with segmentation and the use of heterogeneous ensembles, to further optimize model performance in early anomaly detection.
arXiv Detail & Related papers (2025-10-10T07:24:47Z) - Chain-of-Retrieval Augmented Generation [91.02950964802454]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z) - Hybrid Ensemble Deep Graph Temporal Clustering for Spatiotemporal Data [0.37083047471478225]
We propose a hybrid ensemble graph clustering (HEDGTC) method for multivariate temporal data analysis.
HEDGTC adopts a dual consensus approach to address noise and misclassification from traditional clustering.
When evaluated on three real-world datasets, HEDGTC outperforms state-of-the-art ensemble clustering models.
arXiv Detail & Related papers (2024-09-19T09:14:10Z) - Embedded feature selection in LSTM networks with multi-objective
evolutionary ensemble learning for time series forecasting [49.1574468325115]
We present a novel feature selection method embedded in Long Short-Term Memory networks.
Our approach optimize the weights and biases of the LSTM in a partitioned manner.
Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the ability generalization of conventional LSTMs.
arXiv Detail & Related papers (2023-12-29T08:42:10Z) - Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series [45.76310830281876]
We propose Quantile Sub-Ensembles, a novel method to estimate uncertainty with ensemble of quantile-regression-based task networks.
Our method not only produces accurate imputations that is robust to high missing rates, but also is computationally efficient due to the fast training of its non-generative model.
arXiv Detail & Related papers (2023-12-03T05:52:30Z) - Bayesian Complementary Kernelized Learning for Multidimensional
Spatiotemporal Data [11.763229353978321]
We propose a new statistical framework -- Complementary Complementary Kernelized Learning (BCKL)
BCKL offers superior performance in providing accurate posterior mean and high-quality uncertainty estimates.
arXiv Detail & Related papers (2022-08-21T22:38:54Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - Temporal-Structure-Assisted Gradient Aggregation for Over-the-Air
Federated Edge Learning [24.248673415586413]
We introduce a Markovian probability model to characterize the intrinsic temporal structure of the model aggregation series.
We develop a message passing algorithm, termed temporal-structure-assisted gradient aggregation (TSA-GA), to fulfil this estimation task.
We show that the proposed TSAGA algorithm significantly outperforms the state-of-the-art, and is able to achieve comparable learning performance.
arXiv Detail & Related papers (2021-03-03T09:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.