PREVENT: An Unsupervised Approach to Predict Software Failures in Production
- URL: http://arxiv.org/abs/2208.11939v2
- Date: Tue, 17 Sep 2024 14:02:28 GMT
- Title: PREVENT: An Unsupervised Approach to Predict Software Failures in Production
- Authors: Giovanni Denaro, Rahim Heydarov, Ali Mohebbi, Mauro Pezzè,
- Abstract summary: PREVENT is an approach for predicting and localizing failures in distributed enterprise applications by combining unsupervised techniques.
Results show that PREVENT provides more stable and reliable predictions, earlier than or comparably to supervised learning approaches, without requiring long and often impractical training with failures.
- Score: 5.20605721024686
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents PREVENT, an approach for predicting and localizing failures in distributed enterprise applications by combining unsupervised techniques. Software failures can have dramatic consequences in production, and thus predicting and localizing failures is the essential step to activate healing measures that limit the disruptive consequences of failures. At the state of the art, many failures can be predicted from anomalous combinations of system metrics with respect to either rules provided from domain experts or supervised learning models. However, both these approaches limit the effectiveness of current techniques to well understood types of failures that can be either captured with predefined rules or observed while trining supervised models. PREVENT integrates the core ingredients of unsupervised approaches into a novel approach to predict failures and localize failing resources, without either requiring predefined rules or training with observed failures. The results of experimenting with PREVENT on a commercially-compliant distributed cloud system indicate that PREVENT provides more stable and reliable predictions, earlier than or comparably to supervised learning approaches, without requiring long and often impractical training with failures.
Related papers
- Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies [19.27526590452503]
FAIL-Detect is a two-stage approach for failure detection in imitation learning-based robotic manipulation.
We first distill policy inputs and outputs into scalar signals that correlate with policy failures and capture uncertainty.
Our experiments show learned signals to be mostly consistently effective, particularly when using our novel flow-based density estimator.
arXiv Detail & Related papers (2025-03-11T15:47:12Z) - On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z) - Loss Shaping Constraints for Long-Term Time Series Forecasting [79.3533114027664]
We present a Constrained Learning approach for long-term time series forecasting that respects a user-defined upper bound on the loss at each time-step.
We propose a practical Primal-Dual algorithm to tackle it, and aims to demonstrate that it exhibits competitive average performance in time series benchmarks, while shaping the errors across the predicted window.
arXiv Detail & Related papers (2024-02-14T18:20:44Z) - Learning-Based Approaches to Predictive Monitoring with Conformal
Statistical Guarantees [2.1684857243537334]
This tutorial focuses on efficient methods to predictive monitoring (PM)
PM is the problem of detecting future violations of a given requirement from the current state of a system.
We present a general and comprehensive framework summarizing our approach to the predictive monitoring of CPSs.
arXiv Detail & Related papers (2023-12-04T15:16:42Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Evaluation of Machine Learning Techniques for Forecast Uncertainty
Quantification [0.13999481573773068]
Ensemble forecasting is, so far, the most successful approach to produce relevant forecasts along with an estimation of their uncertainty.
Main limitations of ensemble forecasting are the high computational cost and the difficulty to capture and quantify different sources of uncertainty.
In this work proof-of-concept model experiments are conducted to examine the performance of ANNs trained to predict a corrected state of the system and the state uncertainty using only a single deterministic forecast as input.
arXiv Detail & Related papers (2021-11-29T16:52:17Z) - Competence-Aware Path Planning via Introspective Perception [36.39015240656877]
We propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception.
We introduce competence-aware path planning via introspective perception (CPIP), a Bayesian framework to iteratively learn and exploit task-level competence.
arXiv Detail & Related papers (2021-09-28T18:29:21Z) - Safe Chance Constrained Reinforcement Learning for Batch Process Control [0.0]
Reinforcement Learning (RL) controllers have generated excitement within the control community.
Recent focus on engineering applications has been directed towards the development of safe RL controllers.
arXiv Detail & Related papers (2021-04-23T16:48:46Z) - DEUP: Direct Epistemic Uncertainty Prediction [56.087230230128185]
Epistemic uncertainty is part of out-of-sample prediction error due to the lack of knowledge of the learner.
We propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty.
arXiv Detail & Related papers (2021-02-16T23:50:35Z) - An Uncertainty-based Human-in-the-loop System for Industrial Tool Wear
Analysis [68.8204255655161]
We show that uncertainty measures based on Monte-Carlo dropout in the context of a human-in-the-loop system increase the system's transparency and performance.
A simulation study demonstrates that the uncertainty-based human-in-the-loop system increases performance for different levels of human involvement.
arXiv Detail & Related papers (2020-07-14T15:47:37Z) - Identifying Causal-Effect Inference Failure with Uncertainty-Aware
Models [41.53326337725239]
We introduce a practical approach for integrating uncertainty estimation into a class of state-of-the-art neural network methods.
We show that our methods enable us to deal gracefully with situations of "no-overlap", common in high-dimensional data.
We show that correctly modeling uncertainty can keep us from giving overconfident and potentially harmful recommendations.
arXiv Detail & Related papers (2020-07-01T00:37:41Z) - Excursion Search for Constrained Bayesian Optimization under a Limited
Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures.
Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.