Explaining Drift using Shapley Values
- URL: http://arxiv.org/abs/2401.09756v1
- Date: Thu, 18 Jan 2024 07:07:42 GMT
- Title: Explaining Drift using Shapley Values
- Authors: Narayanan U. Edakunni and Utkarsh Tekriwal and Anukriti Jain
- Abstract summary: Machine learning models often deteriorate in their performance when they are used to predict the outcomes over data on which they were not trained.
There is no framework to identify the drivers behind the drift in model performance.
We propose a novel framework - DBShap that uses principled Shapley values to identify the main contributors of the drift.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine learning models often deteriorate in their performance when they are
used to predict the outcomes over data on which they were not trained. These
scenarios can often arise in real world when the distribution of data changes
gradually or abruptly due to major events like a pandemic. There have been many
attempts in machine learning research to come up with techniques that are
resilient to such Concept drifts. However, there is no principled framework to
identify the drivers behind the drift in model performance. In this paper, we
propose a novel framework - DBShap that uses Shapley values to identify the
main contributors of the drift and quantify their respective contributions. The
proposed framework not only quantifies the importance of individual features in
driving the drift but also includes the change in the underlying relation
between the input and output as a possible driver. The explanation provided by
DBShap can be used to understand the root cause behind the drift and use it to
make the model resilient to the drift.
Related papers
- Methods for Generating Drift in Text Streams [49.3179290313959]
Concept drift is a frequent phenomenon in real-world datasets and corresponds to changes in data distribution over time.
This paper provides four textual drift generation methods to ease the production of datasets with labeled drifts.
Results show that all methods have their performance degraded right after the drifts, and the incremental SVM is the fastest to run and recover the previous performance levels.
arXiv Detail & Related papers (2024-03-18T23:48:33Z) - DriveAdapter: Breaking the Coupling Barrier of Perception and Planning
in End-to-End Autonomous Driving [64.57963116462757]
State-of-the-art methods usually follow the Teacher-Student' paradigm.
Student model only has access to raw sensor data and conducts behavior cloning on the data collected by the teacher model.
We propose DriveAdapter, which employs adapters with the feature alignment objective function between the student (perception) and teacher (planning) modules.
arXiv Detail & Related papers (2023-08-01T09:21:53Z) - Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be
Consistent [97.64313409741614]
We propose to enforce a emphconsistency property which states that predictions of the model on its own generated data are consistent across time.
We show that our novel training objective yields state-of-the-art results for conditional and unconditional generation in CIFAR-10 and baseline improvements in AFHQ and FFHQ.
arXiv Detail & Related papers (2023-02-17T18:45:04Z) - Feature Relevance Analysis to Explain Concept Drift -- A Case Study in
Human Activity Recognition [3.5569545396848437]
This article studies how to detect and explain concept drift.
Drift detection is based on identifying a set of features having the largest relevance difference between the drifting model and a model known to be accurate.
It is shown that feature relevance analysis cannot only be used to detect the concept drift but also to explain the reason for the drift.
arXiv Detail & Related papers (2023-01-20T07:34:27Z) - On the Change of Decision Boundaries and Loss in Learning with Concept
Drift [8.686667049158476]
Concept drift refers to the phenomenon that the distribution generating the observed data changes over time.
Many technologies for learning with drift rely on the interleaved test-train error (ITTE) as a quantity which approximates the model generalization error.
arXiv Detail & Related papers (2022-12-02T14:58:13Z) - Change Detection for Local Explainability in Evolving Data Streams [72.4816340552763]
Local feature attribution methods have become a popular technique for post-hoc and model-agnostic explanations.
It is often unclear how local attributions behave in realistic, constantly evolving settings such as streaming and online applications.
We present CDLEEDS, a flexible and model-agnostic framework for detecting local change and concept drift.
arXiv Detail & Related papers (2022-09-06T18:38:34Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z) - Switching Scheme: A Novel Approach for Handling Incremental Concept
Drift in Real-World Data Sets [0.0]
Concept drifts can severely affect the prediction performance of a machine learning system.
In this work, we analyze the effects of concept drifts in the context of a real-world data set.
We introduce the switching scheme which combines the two principles of retraining and updating of a machine learning model.
arXiv Detail & Related papers (2020-11-05T10:16:54Z) - Handling Concept Drift for Predictions in Business Process Mining [0.0]
Machine learning models are challenged by changing data streams over time which is described as concept drift.
Current research lacks a recommendation which data should be selected for the retraining of the model.
We show that we can improve accuracy from 0.5400 to 0.7010 with concept drift handling.
arXiv Detail & Related papers (2020-05-12T14:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.