Root Causing Prediction Anomalies Using Explainable AI
- URL: http://arxiv.org/abs/2403.02439v1
- Date: Mon, 4 Mar 2024 19:38:50 GMT
- Title: Root Causing Prediction Anomalies Using Explainable AI
- Authors: Ramanathan Vishnampet, Rajesh Shenoy, Jianhui Chen, Anuj Gupta
- Abstract summary: We present a novel application of explainable AI (XAI) for root-causing performance degradation in machine learning models.
A single feature corruption can cause cascading feature, label and concept drifts.
We have successfully applied this technique to improve the reliability of models used in personalized advertising.
- Score: 3.970146574042422
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper presents a novel application of explainable AI (XAI) for
root-causing performance degradation in machine learning models that learn
continuously from user engagement data. In such systems a single feature
corruption can cause cascading feature, label and concept drifts. We have
successfully applied this technique to improve the reliability of models used
in personalized advertising. Performance degradation in such systems manifest
as prediction anomalies in the models. These models are typically trained
continuously using features that are produced by hundreds of real time data
processing pipelines or derived from other upstream models. A failure in any of
these pipelines or an instability in any of the upstream models can cause
feature corruption, causing the model's predicted output to deviate from the
actual output and the training data to become corrupted. The causal
relationship between the features and the predicted output is complex, and
root-causing is challenging due to the scale and dynamism of the system. We
demonstrate how temporal shifts in the global feature importance distribution
can effectively isolate the cause of a prediction anomaly, with better recall
than model-to-feature correlation methods. The technique appears to be
effective even when approximating the local feature importance using a simple
perturbation-based method, and aggregating over a few thousand examples. We
have found this technique to be a model-agnostic, cheap and effective way to
monitor complex data pipelines in production and have deployed a system for
continuously analyzing the global feature importance distribution of
continuously trained models.
Related papers
- Multivariate Data Augmentation for Predictive Maintenance using Diffusion [35.286105732902065]
Predictive maintenance has been used to optimize system repairs in the industrial, medical, and financial domains.
There is a lack of fault data to train these models, due to organizations working to keep fault occurrences and down time to a minimum.
For newly installed systems, no fault data exists since they have yet to fail.
arXiv Detail & Related papers (2024-11-06T16:57:09Z) - Fairness Feedback Loops: Training on Synthetic Data Amplifies Bias [47.79659355705916]
Model-induced distribution shifts (MIDS) occur as previous model outputs pollute new model training sets over generations of models.
We introduce a framework that allows us to track multiple MIDS over many generations, finding that they can lead to loss in performance, fairness, and minoritized group representation.
Despite these negative consequences, we identify how models might be used for positive, intentional, interventions in their data ecosystems.
arXiv Detail & Related papers (2024-03-12T17:48:08Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - Towards Continually Learning Application Performance Models [1.2278517240988065]
Machine learning-based performance models are increasingly being used to build critical job scheduling and application optimization decisions.
Traditionally, these models assume that data distribution does not change as more samples are collected over time.
We develop continually learning performance models that account for the distribution drift, alleviate catastrophic forgetting, and improve generalizability.
arXiv Detail & Related papers (2023-10-25T20:48:46Z) - A hybrid feature learning approach based on convolutional kernels for
ATM fault prediction using event-log data [5.859431341476405]
We present a predictive model based on a convolutional kernel (MiniROCKET and HYDRA) to extract features from event-log data.
The proposed methodology is applied to a significant real-world collected dataset.
The model was integrated into a container-based decision support system to support operators in the timely maintenance of ATMs.
arXiv Detail & Related papers (2023-05-17T08:55:53Z) - A Framework for Machine Learning of Model Error in Dynamical Systems [7.384376731453594]
We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from data.
We cast the problem in both continuous- and discrete-time, for problems in which the model error is memoryless and in which it has significant memory.
We find that hybrid methods substantially outperform solely data-driven approaches in terms of data hunger, demands for model complexity, and overall predictive performance.
arXiv Detail & Related papers (2021-07-14T12:47:48Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Learning Causal Models Online [103.87959747047158]
Predictive models can rely on spurious correlations in the data for making predictions.
One solution for achieving strong generalization is to incorporate causal structures in the models.
We propose an online algorithm that continually detects and removes spurious features.
arXiv Detail & Related papers (2020-06-12T20:49:20Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.