Learning Recovery Strategies for Dynamic Self-healing in Reactive
Systems
- URL: http://arxiv.org/abs/2401.12405v1
- Date: Mon, 22 Jan 2024 23:34:21 GMT
- Title: Learning Recovery Strategies for Dynamic Self-healing in Reactive
Systems
- Authors: Mateo Sanabria, Ivana Dusparic, Nicolas Cardozo
- Abstract summary: Self-healing systems depend on following a set of predefined instructions to recover from a known failure state.
Our proposal targets complex reactive systems, defining monitors as predicates specifying satisfiability conditions of system properties.
We use a Reinforcement Learning-based technique to learn a recovery strategy based on users' corrective sequences.
- Score: 1.7218973692320518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-healing systems depend on following a set of predefined instructions to
recover from a known failure state. Failure states are generally detected based
on domain specific specialized metrics. Failure fixes are applied at predefined
application hooks that are not sufficiently expressive to manage different
failure types. Self-healing is usually applied in the context of distributed
systems, where the detection of failures is constrained to communication
problems, and resolution strategies often consist of replacing complete
components. Our proposal targets complex reactive systems, defining monitors as
predicates specifying satisfiability conditions of system properties. Such
monitors are functionally expressive and can be defined at run time to detect
failure states at any execution point. Once failure states are detected, we use
a Reinforcement Learning-based technique to learn a recovery strategy based on
users' corrective sequences. Finally, to execute the learned strategies, we
extract them as COP variations that activate dynamically whenever the failure
state is detected, overwriting the base system behavior with the recovery
strategy for that state. We validate the feasibility and effectiveness of our
framework through a prototypical reactive application for tracking mouse
movements, and the DeltaIoT exemplar for self-healing systems. Our results
demonstrate that with just the definition of monitors, the system is effective
in detecting and recovering from failures between 55%-92% of the cases in the
first application, and at par with the predefined strategies in the second
application.
Related papers
- Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress [31.952925824381325]
We propose a runtime monitoring framework that splits the detection of failures into two complementary categories.
We use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task.
By unifying temporal consistency detection and VLM runtime monitoring, Sentinel detects 18% more failures than using either of the two detectors alone.
arXiv Detail & Related papers (2024-10-06T22:13:30Z) - Learning Prompt-Enhanced Context Features for Weakly-Supervised Video
Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges.
We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability.
Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - System Resilience through Health Monitoring and Reconfiguration [56.448036299746285]
We demonstrate an end-to-end framework to improve the resilience of man-made systems to unforeseen events.
The framework is based on a physics-based digital twin model and three modules tasked with real-time fault diagnosis, prognostics and reconfiguration.
arXiv Detail & Related papers (2022-08-30T20:16:17Z) - Active Learning-based Isolation Forest (ALIF): Enhancing Anomaly
Detection in Decision Support Systems [2.922007656878633]
ALIF is a lightweight modification of the popular Isolation Forest that proved superior performances with respect to other state-of-art algorithms.
The proposed approach is particularly appealing in the presence of a Decision Support System (DSS), a case that is increasingly popular in real-world scenarios.
arXiv Detail & Related papers (2022-07-08T14:36:38Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Indicators of Attack Failure: Debugging and Improving Optimization of
Adversarial Examples [29.385242714424624]
evaluating robustness of machine-learning models to adversarial examples is a challenging problem.
We define a set of quantitative indicators which unveil common failures in the optimization of gradient-based attacks.
Our experimental analysis shows that the proposed indicators of failure can be used to visualize, debug and improve current adversarial robustness evaluations.
arXiv Detail & Related papers (2021-06-18T06:57:58Z) - No Need to Know Physics: Resilience of Process-based Model-free Anomaly
Detection for Industrial Control Systems [95.54151664013011]
We present a novel framework to generate adversarial spoofing signals that violate physical properties of the system.
We analyze four anomaly detectors published at top security conferences.
arXiv Detail & Related papers (2020-12-07T11:02:44Z) - A Background-Agnostic Framework with Adversarial Training for Abnormal
Event Detection in Video [120.18562044084678]
Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years.
We propose a background-agnostic framework that learns from training videos containing only normal events.
arXiv Detail & Related papers (2020-08-27T18:39:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.