Revisiting Methods for Finding Influential Examples
- URL: http://arxiv.org/abs/2111.04683v1
- Date: Mon, 8 Nov 2021 18:00:06 GMT
- Title: Revisiting Methods for Finding Influential Examples
- Authors: Karthikeyan K, Anders S{\o}gaard
- Abstract summary: Methods for finding influential training examples for test-time decisions have been proposed.
In this paper, we show that all of the above methods are unstable.
We propose to evaluate such explanations by their ability to detect poisoning attacks.
- Score: 2.094022863940315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Several instance-based explainability methods for finding influential
training examples for test-time decisions have been proposed recently,
including Influence Functions, TraceIn, Representer Point Selection, Grad-Dot,
and Grad-Cos. Typically these methods are evaluated using LOO influence (Cook's
distance) as a gold standard, or using various heuristics. In this paper, we
show that all of the above methods are unstable, i.e., extremely sensitive to
initialization, ordering of the training data, and batch size. We suggest that
this is a natural consequence of how in the literature, the influence of
examples is assumed to be independent of model state and other examples -- and
argue it is not. We show that LOO influence and heuristics are, as a result,
poor metrics to measure the quality of instance-based explanations, and instead
propose to evaluate such explanations by their ability to detect poisoning
attacks. Further, we provide a simple, yet effective baseline to improve all of
the above methods and show how it leads to very significant improvements on
downstream tasks.
Related papers
- Scalable Influence and Fact Tracing for Large Language Model Pretraining [14.598556308631018]
Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples.
This paper refines existing gradient-based methods to work effectively at scale.
arXiv Detail & Related papers (2024-10-22T20:39:21Z) - The Susceptibility of Example-Based Explainability Methods to Class Outliers [3.748789746936121]
This study explores the impact of class outliers on the effectiveness of example-based explainability methods for black-box machine learning models.
We reformulate existing explainability evaluation metrics, such as correctness and relevance, specifically for example-based methods, and introduce a new metric, distinguishability.
Using these metrics, we highlight the shortcomings of current example-based explainability methods, including those who attempt to suppress class outliers.
arXiv Detail & Related papers (2024-07-30T09:20:15Z) - Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment
Effect Estimation [137.3520153445413]
A notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference.
We evaluate seven established baseline causal discovery methods including a newly proposed method based on GFlowNets.
The results of our study demonstrate that some of the algorithms studied are able to effectively capture a wide range of useful and diverse ATE modes.
arXiv Detail & Related papers (2023-07-11T02:58:10Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness,
and Semantic Evaluation [23.72825603188359]
We can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit.
We propose a semantic-based evaluation metric that can better align with humans' judgment of explanations.
arXiv Detail & Related papers (2021-06-09T00:49:56Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - RelatIF: Identifying Explanatory Training Examples via Relative
Influence [13.87851325824883]
We use influence functions to identify relevant training examples that one might hope "explain" the predictions of a machine learning model.
We introduce RelatIF, a new class of criteria for choosing relevant training examples by way of an optimization objective that places a constraint on global influence.
In empirical evaluations, we find that the examples returned by RelatIF are more intuitive when compared to those found using influence functions.
arXiv Detail & Related papers (2020-03-25T20:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.