Hybrid Ensemble Method for Detecting Cyber-Attacks in Water Distribution Systems Using the BATADAL Dataset
- URL: http://arxiv.org/abs/2512.14422v1
- Date: Tue, 16 Dec 2025 14:07:22 GMT
- Title: Hybrid Ensemble Method for Detecting Cyber-Attacks in Water Distribution Systems Using the BATADAL Dataset
- Authors: Waqas Ahmed,
- Abstract summary: We consider a hybrid ensemble learning model that will enhance the detection ability of cyber-attacks in Water Distribution Systems.<n>The proposed framework establishes a robust and scalable solution for cyber-attack detection in time-dependent industrial systems.
- Score: 1.4975436239088316
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The cybersecurity of Industrial Control Systems that manage critical infrastructure such as Water Distribution Systems has become increasingly important as digital connectivity expands. BATADAL benchmark data is a good source of testing intrusion detection techniques, but it presents several important problems, such as imbalance in the number of classes, multivariate time dependence, and stealthy attacks. We consider a hybrid ensemble learning model that will enhance the detection ability of cyber-attacks in WDS by using the complementary capabilities of machine learning and deep learning models. Three base learners, namely, Random Forest , eXtreme Gradient Boosting , and Long Short-Term Memory network, have been strictly compared and seven ensemble types using simple averaged and stacked learning with a logistic regression meta-learner. Random Forest analysis identified top predictors turned into temporal and statistical features, and Synthetic Minority Oversampling Technique (SMOTE) was used to overcome the class imbalance issue. The analyics indicates that the single Long Short-Term Memory network model is of poor performance (F1 = 0.000, AUC = 0.4460), but tree-based models, especially eXtreme Gradient Boosting, perform well (F1 = 0.7470, AUC=0.9684). The hybrid stacked ensemble of Random Forest , eXtreme Gradient Boosting , and Long Short-Term Memory network scored the highest, with the attack class of 0.7205 with an F1-score and a AUC of 0.9826 indicating that the heterogeneous stacking between model precision and generalization can work. The proposed framework establishes a robust and scalable solution for cyber-attack detection in time-dependent industrial systems, integrating temporal learning and ensemble diversity to support the secure operation of critical infrastructure.
Related papers
- MI$^2$DAS: A Multi-Layer Intrusion Detection Framework with Incremental Learning for Securing Industrial IoT Networks [47.386868423451595]
MI$2$DAS is a multi-layer intrusion detection framework that integrates anomaly-based hierarchical traffic pooling and open-set recognition.<n>Experiments conducted on the Edge-IIoTset dataset demonstrate strong performance across all layers.<n>These results showcase MI$2$DAS as an effective, scalable and adaptive framework for enhancing IIoT security.
arXiv Detail & Related papers (2026-02-27T09:37:05Z) - Enhancing Continual Learning for Software Vulnerability Prediction: Addressing Catastrophic Forgetting via Hybrid-Confidence-Aware Selective Replay for Temporal LLM Fine-Tuning [43.24339861841546]
This paper investigates fine-tuning of a decoder-style language model (phi/phi-2 with LoRA) on a CVE-linked dataset.<n>We evaluate eight continual learning strategies, including window-only and cumulative training, replay-based baselines and regularisation-based variants.<n>Hybrid-CASR reduces training time per window by about 17 percent compared to the baseline, whereas cumulative training delivers only a minor F1 increase (0.661) at a 15.9-fold computational cost.
arXiv Detail & Related papers (2026-02-27T09:13:23Z) - Attack-Specialized Deep Learning with Ensemble Fusion for Network Anomaly Detection [1.2461503242570642]
Traditional intrusion detection systems struggle to maintain high accuracy across both frequent and rare attacks.<n>We propose a hybrid anomaly detection framework that integrates specialized deep learning models with an ensemble meta-classifier.<n>The proposed system achieves near-perfect detection rates with minimal false alarms, highlighting its robustness and generalizability.
arXiv Detail & Related papers (2025-10-14T12:41:16Z) - A Comparative Analysis of Ensemble-Based Machine Learning Approaches with Explainable AI for Multi-Class Intrusion Detection in Drone Networks [0.2708211191235587]
This research aims to develop a robust and interpretable intrusion detection framework tailored for drone networks.<n>We present a comparative analysis of ensemble-based machine learning models, namely Random Forest, Extra Trees, AdaBoost, CatBoost, and XGBoost, trained on a labeled dataset.<n>The proposed approach not only delivers near-perfect accuracy but also ensures interpretability, making it highly suitable for real-time and safety-critical drone operations.
arXiv Detail & Related papers (2025-09-23T00:59:21Z) - Enhancing IoT Cyber Attack Detection in the Presence of Highly Imbalanced Data [0.0]
This study uses hybrid sampling techniques to improve data imbalance detection accuracy in IoT domains.<n>We evaluate the performance of several machine learning models with respect to the classification of cyber-attacks.<n>Overall, this work demonstrates the value of hybrid sampling combined with robust model and feature selection for significantly improving IoT security.
arXiv Detail & Related papers (2025-05-15T14:02:48Z) - Learning in Multiple Spaces: Few-Shot Network Attack Detection with Metric-Fused Prototypical Networks [47.18575262588692]
We propose a novel Multi-Space Prototypical Learning framework tailored for few-shot attack detection.<n>By leveraging Polyak-averaged prototype generation, the framework stabilizes the learning process and effectively adapts to rare and zero-day attacks.<n> Experimental results on benchmark datasets demonstrate that MSPL outperforms traditional approaches in detecting low-profile and novel attack types.
arXiv Detail & Related papers (2024-12-28T00:09:46Z) - Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data [39.40116554523575]
We present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network.
It learns to approximate Bayesian inference on synthetic datasets drawn from a prior.
It improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while maintaining stronger calibration.
arXiv Detail & Related papers (2024-11-15T23:49:23Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - A Dependable Hybrid Machine Learning Model for Network Intrusion
Detection [1.222622290392729]
We propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability.
Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022.
arXiv Detail & Related papers (2022-12-08T20:19:27Z) - An Online Ensemble Learning Model for Detecting Attacks in Wireless
Sensor Networks [0.0]
We develop an intelligent, efficient, and updatable intrusion detection system by applying an important machine learning concept known as ensemble learning.
In this paper, we examine the application of different homogeneous and heterogeneous online ensembles in sensory data analysis.
Among the proposed novel online ensembles, both the heterogeneous ensemble consisting of an Adaptive Random Forest (ARF) combined with the Hoeffding Adaptive Tree (HAT) algorithm and the homogeneous ensemble HAT made up of 10 models achieved higher detection rates of 96.84% and 97.2%, respectively.
arXiv Detail & Related papers (2022-04-28T23:10:47Z) - DessiLBI: Exploring Structural Sparsity of Deep Networks via
Differential Inclusion Paths [45.947140164621096]
We propose a new approach based on differential inclusions of inverse scale spaces.
We show that DessiLBI unveils "winning tickets" in early epochs.
arXiv Detail & Related papers (2020-07-04T04:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.