Resilient by Design -- Active Inference for Distributed Continuum Intelligence
- URL: http://arxiv.org/abs/2511.07202v2
- Date: Tue, 18 Nov 2025 10:23:07 GMT
- Title: Resilient by Design -- Active Inference for Distributed Continuum Intelligence
- Authors: Praveen Kumar Donta, Alfreds Lapkovskis, Enzo Mingozzi, Schahram Dustdar,
- Abstract summary: This paper introduces a Probabilistic Active Inference Resilience Agent (PAIR-Agent) to achieve resilience in DCC systems.<n>PAIR-Agent performs three core operations: (i) constructing a causal fault graph from device logs, (ii) identifying faults while managing certainties and uncertainties using Markov blankets and the free energy principle, and (iii) autonomously healing issues through active inference.
- Score: 6.2095604527397485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Failures are the norm in highly complex and heterogeneous devices spanning the distributed computing continuum (DCC), from resource-constrained IoT and edge nodes to high-performance computing systems. Ensuring reliability and global consistency across these layers remains a major challenge, especially for AI-driven workloads requiring real-time, adaptive coordination. This work-in-progress paper introduces a Probabilistic Active Inference Resilience Agent (PAIR-Agent) to achieve resilience in DCC systems. PAIR-Agent performs three core operations: (i) constructing a causal fault graph from device logs, (ii) identifying faults while managing certainties and uncertainties using Markov blankets and the free energy principle, and (iii) autonomously healing issues through active inference. Through continuous monitoring and adaptive reconfiguration, the agent maintains service continuity and stability under diverse failure conditions. Theoretical validations confirm the reliability and effectiveness of the proposed framework.
Related papers
- Multi-Agent Collaborative Intrusion Detection for Low-Altitude Economy IoT: An LLM-Enhanced Agentic AI Framework [60.72591149679355]
The rapid expansion of low-altitude economy Internet of Things (LAE-IoT) networks has created unprecedented security challenges.<n>Traditional intrusion detection systems fail to tackle the unique characteristics of aerial IoT environments.<n>We introduce a large language model (LLM)-enabled agentic AI framework for enhancing intrusion detection in LAE-IoT networks.
arXiv Detail & Related papers (2026-01-25T12:47:25Z) - Agentic Confidence Calibration [67.50096917021521]
Holistic Trajectory (HTC) is a novel diagnostic framework for AI agents.<n>HTC consistently surpasses strong baselines in both calibration and discrimination.<n>HTC provides interpretability by revealing the signals behind failure.
arXiv Detail & Related papers (2026-01-22T09:08:25Z) - Agentic Uncertainty Quantification [76.94013626702183]
We propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals.<n>Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary.
arXiv Detail & Related papers (2026-01-22T07:16:26Z) - Logic-Driven Semantic Communication for Resilient Multi-Agent Systems [26.964933264412412]
6G networks are accelerating autonomy and intelligence in large-scale, decentralized multi-agent systems.<n>This article proposes a formal definition of MAS resilience grounded in two complementary dimensions.<n>We design an agent architecture and develop decentralized algorithms to achieve both epistemic and action resilience.
arXiv Detail & Related papers (2026-01-11T00:54:09Z) - Bio-inspired Agentic Self-healing Framework for Resilient Distributed Computing Continuum Systems [4.003029907200818]
ReCiSt is a bio-inspired agentic self-healing framework designed to achieve resilience in Distributed Computing Continuum Systems (DCCS)<n>ReCiSt reconstructs the biological phases of Hemostasis, Inflammation, Proliferation, and Remodeling into the computational layers Containment, Diagnosis, Meta-Cognitive, and Knowledge for DCCS.<n>These four layers perform autonomous fault isolation, causal diagnosis, adaptive recovery, and long-term knowledge consolidation through Language Model (LM)-powered agents.
arXiv Detail & Related papers (2026-01-01T13:30:38Z) - Towards Efficient Agents: A Co-Design of Inference Architecture and System [66.59916327634639]
This paper presents AgentInfer, a unified framework for end-to-end agent acceleration.<n>We decompose the problem into four synergistic components: AgentCollab, AgentSched, AgentSAM, and AgentCompress.<n>Experiments on the BrowseComp-zh and DeepDiver benchmarks demonstrate that through the synergistic collaboration of these methods, AgentInfer reduces ineffective token consumption by over 50%.
arXiv Detail & Related papers (2025-12-20T12:06:13Z) - Reinforcement Learning for Self-Healing Material Systems [0.40334315349753025]
This study frames the self-healing process as a Reinforcement Learning problem within a Markov Decision Process (MDP), enabling agents to autonomously derive optimal policies.<n>A comparative evaluation of discrete-action (Q-learning, DQN) and continuous-action (TD3) agents in a simulation environment revealed that RL controllers significantly outperform baselines, achieving near-complete material recovery.
arXiv Detail & Related papers (2025-11-24T03:42:00Z) - Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning [55.59724323303857]
We propose a framework that balances exploration and exploitation via three components: difficulty-aware coefficient allocation, initial-anchored target entropy, and dynamic global coefficient adjustment.<n>Experiments on multiple mathematical reasoning benchmarks show that AER consistently outperforms baselines, improving both reasoning accuracy and exploration capability.
arXiv Detail & Related papers (2025-10-13T03:10:26Z) - Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization [53.82400605816587]
Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation.<n>A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios.<n>We introduce Continual AQA (CAQA), which equips with Continual Learning capabilities to handle evolving distributions.
arXiv Detail & Related papers (2025-10-08T10:09:47Z) - PowerGrow: Feasible Co-Growth of Structures and Dynamics for Power Grid Synthesis [75.14189839277928]
We present PowerGrow, a co-generative framework that significantly reduces computational overhead while maintaining operational validity.<n> Experiments across benchmark settings show that PowerGrow outperforms prior diffusion models in fidelity and diversity.<n>This demonstrates its ability to generate operationally valid and realistic power grid scenarios.
arXiv Detail & Related papers (2025-08-29T01:47:27Z) - ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification [51.07970070817353]
An ideal time series classification (TSC) should be able to capture invariant representations.<n>Current methods are largely unguided, lacking the semantic direction required to isolate truly universal features.<n>We propose an end-to-end Energy-Regularized Information for Shift-Robustness framework to enable guided and reliable feature disentanglement.
arXiv Detail & Related papers (2025-08-19T12:13:41Z) - Hybrid Temporal Differential Consistency Autoencoder for Efficient and Sustainable Anomaly Detection in Cyber-Physical Systems [0.0]
Cyberattacks on critical infrastructure, particularly water distribution systems, have increased due to rapid digitalization.<n>This study addresses key challenges in anomaly detection by leveraging time correlations in sensor data.<n>We propose a hybrid autoencoder-based approach, referred to as hybrid TDC-AE, which extends TDC by incorporating both deterministic nodes and conventional statistical nodes.
arXiv Detail & Related papers (2025-04-08T09:22:44Z) - Anomaly Detection in Complex Dynamical Systems: A Systematic Framework Using Embedding Theory and Physics-Inspired Consistency [0.0]
Anomaly detection in complex dynamical systems is essential for ensuring reliability, safety, and efficiency in industrial and cyber-physical infrastructures.<n>We propose a system-theoretic approach to anomaly detection, grounded in classical embedding theory and physics-inspired consistency principles.<n>Our findings support the hypothesis that anomalies disrupt stable system dynamics, providing a robust signal for anomaly detection.
arXiv Detail & Related papers (2025-02-26T17:06:13Z) - Federated Learning for Efficient Condition Monitoring and Anomaly Detection in Industrial Cyber-Physical Systems [0.30723404270319693]
This paper introduces an enhanced FL framework with three key innovations: adaptive model aggregation based on sensor reliability, dynamic node selection for resource optimization, and Weibull-based checkpointing for fault tolerance.<n> Experiments on the NASA Bearing and Hydraulic System datasets demonstrate superior performance compared to state-of-the-art FL methods, achieving 99.5% AUC-ROC in anomaly detection and maintaining accuracy even under node failures.
arXiv Detail & Related papers (2025-01-28T03:04:47Z) - Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation [49.53202761595912]
Continual Test-Time Adaptation involves adapting a pre-trained source model to continually changing unsupervised target domains.
We analyze the challenges of this task: online environment, unsupervised nature, and the risks of error accumulation and catastrophic forgetting.
We propose an uncertainty-aware buffering approach to identify and aggregate significant samples with high certainty from the unsupervised, single-pass data stream.
arXiv Detail & Related papers (2024-07-12T15:48:40Z) - Investigating Robustness in Cyber-Physical Systems: Specification-Centric Analysis in the face of System Deviations [8.8690305802668]
A critical attribute of cyber-physical systems (CPS) is robustness, denoting its capacity to operate safely.
This paper proposes a novel specification-based robustness, which characterizes the effectiveness of a controller in meeting a specified system requirement.
We present an innovative two-layer simulation-based analysis framework designed to identify subtle robustness violations.
arXiv Detail & Related papers (2023-11-13T16:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.