A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks
- URL: http://arxiv.org/abs/2512.14297v1
- Date: Tue, 16 Dec 2025 11:11:37 GMT
- Title: A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks
- Authors: Agrippina Mwangi, León Navarro-Hilfiker, Lukasz Brewka, Mikkel Gryning, Elena Fumagalli, Madeleine Gibescu,
- Abstract summary: Flash events are major contributors to intermittent service degradation in software-defined industrial networks.<n>This study proposes a threshold-triggered Deep Q-Network self-healing agent that autonomically detects, analyzes, and mitigates network disruptions.<n>The proposed agent improves disruption recovery performance by 53.84% compared to a baseline shortest-path and load-balanced routing approach.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic disruptions such as flash events arising from benign traffic bursts and switch thermal fluctuations are major contributors to intermittent service degradation in software-defined industrial networks. These events violate IEC~61850-derived quality-of-service requirements and user-defined service-level agreements, hindering the reliable and timely delivery of control, monitoring, and best-effort traffic in IEC~61400-25-compliant wind power plants. Failure to maintain these requirements often results in delayed or lost control signals, reduced operational efficiency, and increased risk of wind turbine generator downtime. To address these challenges, this study proposes a threshold-triggered Deep Q-Network self-healing agent that autonomically detects, analyzes, and mitigates network disruptions while adapting routing behavior and resource allocation in real time. The proposed agent was trained, validated, and tested on an emulated tri-clustered switch network deployed in a cloud-based proof-of-concept testbed. Simulation results show that the proposed agent improves disruption recovery performance by 53.84% compared to a baseline shortest-path and load-balanced routing approach and outperforms state-of-the-art methods, including the Adaptive Network-based Fuzzy Inference System by 13.1% and the Deep Q-Network and traffic prediction-based routing optimization method by 21.5%, in a super-spine leaf data-plane architecture. Additionally, the agent maintains switch thermal stability by proactively initiating external rack cooling when required. These findings highlight the potential of deep reinforcement learning in building resilience in software-defined industrial networks deployed in mission-critical, time-sensitive application scenarios.
Related papers
- Blockchain-Enabled Routing for Zero-Trust Low-Altitude Intelligent Networks [77.17664010626726]
We focus on the routing with multiple UAV clusters in low-altitude intelligent networks (LAINs)<n>To minimize the damage caused by potential threats, we present the zero-trust architecture with the software-defined perimeter and blockchain techniques.<n>We show that the proposed framework reduces the average E2E delay by 59% and improves the TSR by 29% on average compared to benchmarks.
arXiv Detail & Related papers (2026-02-27T04:30:35Z) - Multi-Agent Collaborative Intrusion Detection for Low-Altitude Economy IoT: An LLM-Enhanced Agentic AI Framework [60.72591149679355]
The rapid expansion of low-altitude economy Internet of Things (LAE-IoT) networks has created unprecedented security challenges.<n>Traditional intrusion detection systems fail to tackle the unique characteristics of aerial IoT environments.<n>We introduce a large language model (LLM)-enabled agentic AI framework for enhancing intrusion detection in LAE-IoT networks.
arXiv Detail & Related papers (2026-01-25T12:47:25Z) - Automating Supply Chain Disruption Monitoring via an Agentic AI Approach [49.77982322940809]
We introduce a minimally supervised agentic AI framework that autonomously monitors, analyses, and responds to disruptions across extended supply networks.<n>The system achieves high accuracy across core tasks, with F1 scores between 0.962 and 0.991, and performs full end-to-end analyses in a mean of 3.83 minutes at a cost of $0.0836 per disruption.
arXiv Detail & Related papers (2026-01-14T18:28:31Z) - An Adaptive Intelligent Thermal-Aware Routing Protocol for Wireless Body Area Networks [0.0]
This paper proposes an intelligent temperature-aware and reliability-based routing approach for WBANs.<n>It improves throughput by 13 percent, reduces end-to-end delay by 10 percent, decreases energy consumption by 25 percent, and lowers routing load by 30 percent compared to existing methods.
arXiv Detail & Related papers (2025-10-22T07:02:32Z) - Reliability and Resilience of AI-Driven Critical Network Infrastructure under Cyber-Physical Threats [1.7614511833648008]
This paper proposes a fault-tolerant and resilience-aware framework to mitigate cascading failures under cyber-physical attack conditions.<n>A comprehensive validation is carried out using NS-3 simulations, where key performance indicators such as reliability, latency, resilience index, and packet loss rate are analyzed.
arXiv Detail & Related papers (2025-10-22T06:56:44Z) - Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks [55.467288506826755]
Federated learning (FL) has been recognized as a viable solution for local-privacy-aware collaborative model training in wireless edge networks.<n>Most existing communication-efficient FL algorithms fail to reduce the significant inter-device variance.<n>We propose a novel communication-efficient FL algorithm, named FedQVR, which relies on a sophisticated variance-reduced scheme.
arXiv Detail & Related papers (2025-01-20T04:26:21Z) - Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust
Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings.
We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z) - Multiagent Reinforcement Learning with an Attention Mechanism for
Improving Energy Efficiency in LoRa Networks [52.96907334080273]
As the network scale increases, the energy efficiency of LoRa networks decreases sharply due to severe packet collisions.
We propose a transmission parameter allocation algorithm based on multiagent reinforcement learning (MALoRa)
Simulation results demonstrate that MALoRa significantly improves the system EE compared with baseline algorithms.
arXiv Detail & Related papers (2023-09-16T11:37:23Z) - CAROL: Confidence-Aware Resilience Model for Edge Federations [13.864161788250856]
We present a confidence aware resilience model, CAROL, that utilizes a memory-efficient generative neural network to predict the Quality of Service (QoS) for a future state and a confidence score for each prediction.
CAROL outperforms state-of-the-art resilience schemes by reducing the energy consumption, deadline violation rates and resilience overheads by up to 16, 17 and 36 percent, respectively.
arXiv Detail & Related papers (2022-03-14T14:37:31Z) - Adaptive network reliability analysis: Methodology and applications to
power grid [0.0]
This study presents the first adaptive surrogate-based Network Reliability Analysis using Bayesian Additive Regression Trees (ANR-BART)
Results indicate that ANR-BART is robust and yields accurate estimates of network failure probability, while significantly reducing the computational cost of reliability analysis.
arXiv Detail & Related papers (2021-09-11T19:58:08Z) - Artificial Intelligence based Sensor Data Analytics Framework for Remote
Electricity Network Condition Monitoring [0.0]
Rural electrification demands the use of inexpensive technologies such as single wire earth return (SWER) networks.
There is a steadily growing energy demand from remote consumers, and the capacity of existing lines may become inadequate soon.
High impedance arcing faults (HIF) from SWER lines can cause catastrophic bushfires such as the 2009 Black Saturday event.
arXiv Detail & Related papers (2021-01-21T07:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.