Online Identification of IT Systems through Active Causal Learning
- URL: http://arxiv.org/abs/2509.02130v2
- Date: Sun, 07 Sep 2025 02:18:18 GMT
- Title: Online Identification of IT Systems through Active Causal Learning
- Authors: Kim Hammar, Rolf Stadler,
- Abstract summary: We present the first principled method for online, data-driven identification of an IT system in the form of a causal model.<n>We show that our method enables accurate identification of a causal system model while inducing low interference with system operations.
- Score: 1.7188280334580195
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Identifying a causal model of an IT system is fundamental to many branches of systems engineering and operation. Such a model can be used to predict the effects of control actions, optimize operations, diagnose failures, detect intrusions, etc., which is central to achieving the longstanding goal of automating network and system management tasks. Traditionally, causal models have been designed and maintained by domain experts. This, however, proves increasingly challenging with the growing complexity and dynamism of modern IT systems. In this paper, we present the first principled method for online, data-driven identification of an IT system in the form of a causal model. The method, which we call active causal learning, estimates causal functions that capture the dependencies among system variables in an iterative fashion using Gaussian process regression based on system measurements, which are collected through a rollout-based intervention policy. We prove that this method is optimal in the Bayesian sense and that it produces effective interventions. Experimental validation on a testbed shows that our method enables accurate identification of a causal system model while inducing low interference with system operations.
Related papers
- Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems [5.065341495341096]
Fault diagnosis in Cyber-Physical Systems (CPSs) is essential for ensuring system dependability and operational efficiency.<n>We present a novel unsupervised fault diagnosis methodology that integrates collective anomaly detection in time series, process mining, and simulation.<n>This enables the creation of comprehensive fault dictionaries that support predictive maintenance and the development of digital twins for industrial environments.
arXiv Detail & Related papers (2025-06-26T17:29:37Z) - Online Multi-modal Root Cause Analysis [61.94987309148539]
Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems.
Existing online RCA methods handle only single-modal data overlooking, complex interactions in multi-modal systems.
We introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization.
arXiv Detail & Related papers (2024-10-13T21:47:36Z) - On the Fly Detection of Root Causes from Observed Data with Application to IT Systems [3.3321350585823826]
This paper introduces a new structural causal model tailored for representing threshold-based IT systems.
It presents a new algorithm designed to rapidly detect root causes of anomalies in such systems.
arXiv Detail & Related papers (2024-02-09T16:10:19Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - System Resilience through Health Monitoring and Reconfiguration [56.448036299746285]
We demonstrate an end-to-end framework to improve the resilience of man-made systems to unforeseen events.
The framework is based on a physics-based digital twin model and three modules tasked with real-time fault diagnosis, prognostics and reconfiguration.
arXiv Detail & Related papers (2022-08-30T20:16:17Z) - Data-driven Residual Generation for Early Fault Detection with Limited
Data [4.129225533930966]
In many complex systems it is not feasible to develop highly accurate models for the systems.
Data-driven solutions have received an immense attention in the industries systems for several practical reasons.
Unlike the model-based methods it is straight forward to combine time series measurements such as pressure and voltage with other sources of information.
arXiv Detail & Related papers (2021-09-28T03:18:03Z) - Identifying Causal Structure in Dynamical Systems [6.451261098085498]
We propose a method that identifies the causal structure of control systems.
Experiments on a robot arm demonstrate reliable causal identification from real-world data.
arXiv Detail & Related papers (2020-06-06T16:17:07Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.