The Susceptibility of Example-Based Explainability Methods to Class Outliers
- URL: http://arxiv.org/abs/2407.20678v2
- Date: Thu, 1 Aug 2024 14:09:12 GMT
- Title: The Susceptibility of Example-Based Explainability Methods to Class Outliers
- Authors: Ikhtiyor Nematov, Dimitris Sacharidis, Tomer Sagi, Katja Hose,
- Abstract summary: This study explores the impact of class outliers on the effectiveness of example-based explainability methods for black-box machine learning models.
We reformulate existing explainability evaluation metrics, such as correctness and relevance, specifically for example-based methods, and introduce a new metric, distinguishability.
Using these metrics, we highlight the shortcomings of current example-based explainability methods, including those who attempt to suppress class outliers.
- Score: 3.748789746936121
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study explores the impact of class outliers on the effectiveness of example-based explainability methods for black-box machine learning models. We reformulate existing explainability evaluation metrics, such as correctness and relevance, specifically for example-based methods, and introduce a new metric, distinguishability. Using these metrics, we highlight the shortcomings of current example-based explainability methods, including those who attempt to suppress class outliers. We conduct experiments on two datasets, a text classification dataset and an image classification dataset, and evaluate the performance of four state-of-the-art explainability methods. Our findings underscore the need for robust techniques to tackle the challenges posed by class outliers.
Related papers
- Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Exploiting Fairness to Enhance Sensitive Attributes Reconstruction [0.0]
In recent years, a growing body of work has emerged on how to learn machine learning models under fairness constraints.
We show that information about this model's fairness can be exploited by the adversary to enhance his reconstruction of the sensitive attributes of the training data.
We propose a generic reconstruction correction method, which takes as input an initial guess and corrects it to comply with some user-defined constraints.
arXiv Detail & Related papers (2022-09-02T06:15:15Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Active Weighted Aging Ensemble for Drifted Data Stream Classification [2.277447144331876]
Concept drift destabilizes the performance of the classification model and seriously degrades its quality.
The proposed method has been evaluated through computer experiments using both real and generated data streams.
The results confirm the high quality of the proposed algorithm over state-of-the-art methods.
arXiv Detail & Related papers (2021-12-19T13:52:53Z) - Revisiting Methods for Finding Influential Examples [2.094022863940315]
Methods for finding influential training examples for test-time decisions have been proposed.
In this paper, we show that all of the above methods are unstable.
We propose to evaluate such explanations by their ability to detect poisoning attacks.
arXiv Detail & Related papers (2021-11-08T18:00:06Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness,
and Semantic Evaluation [23.72825603188359]
We can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit.
We propose a semantic-based evaluation metric that can better align with humans' judgment of explanations.
arXiv Detail & Related papers (2021-06-09T00:49:56Z) - Visualization of Supervised and Self-Supervised Neural Networks via
Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability.
In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.