Related papers: Revisiting Methods for Finding Influential Examples

Revisiting Methods for Finding Influential Examples

URL: http://arxiv.org/abs/2111.04683v1
Date: Mon, 8 Nov 2021 18:00:06 GMT
Title: Revisiting Methods for Finding Influential Examples
Authors: Karthikeyan K, Anders S{\o}gaard
Abstract summary: Methods for finding influential training examples for test-time decisions have been proposed. In this paper, we show that all of the above methods are unstable. We propose to evaluate such explanations by their ability to detect poisoning attacks.
Score: 2.094022863940315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Several instance-based explainability methods for finding influential training examples for test-time decisions have been proposed recently, including Influence Functions, TraceIn, Representer Point Selection, Grad-Dot, and Grad-Cos. Typically these methods are evaluated using LOO influence (Cook's distance) as a gold standard, or using various heuristics. In this paper, we show that all of the above methods are unstable, i.e., extremely sensitive to initialization, ordering of the training data, and batch size. We suggest that this is a natural consequence of how in the literature, the influence of examples is assumed to be independent of model state and other examples -- and argue it is not. We show that LOO influence and heuristics are, as a result, poor metrics to measure the quality of instance-based explanations, and instead propose to evaluate such explanations by their ability to detect poisoning attacks. Further, we provide a simple, yet effective baseline to improve all of the above methods and show how it leads to very significant improvements on downstream tasks.

Related papers

Integrated Influence: Data Attribution with Baseline [10.269458218353074]
We propose Integrated Influence, a novel data attribution method that incorporates a baseline approach.<n>Our method defines a baseline dataset, follows a data degeneration process to transition the current dataset to the baseline, and accumulates the influence of each sample throughout this process.<n> Experimental results show that Integrated Influence generates more reliable data attributions compared to existing methods in both data attribution task and mislablled example identification task.
arXiv Detail & Related papers (2025-08-07T07:16:12Z)
How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations [69.72654127617058]
Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs) In this work we bring forward empirical evidence that challenges this very notion. We discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer play a crucial role.
arXiv Detail & Related papers (2025-03-01T22:25:11Z)
Scalable Influence and Fact Tracing for Large Language Model Pretraining [14.598556308631018]
Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples. This paper refines existing gradient-based methods to work effectively at scale.
arXiv Detail & Related papers (2024-10-22T20:39:21Z)
The Susceptibility of Example-Based Explainability Methods to Class Outliers [3.748789746936121]
This study explores the impact of class outliers on the effectiveness of example-based explainability methods for black-box machine learning models. We reformulate existing explainability evaluation metrics, such as correctness and relevance, specifically for example-based methods, and introduce a new metric, distinguishability. Using these metrics, we highlight the shortcomings of current example-based explainability methods, including those who attempt to suppress class outliers.
arXiv Detail & Related papers (2024-07-30T09:20:15Z)
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation [137.3520153445413]
A notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference. We evaluate seven established baseline causal discovery methods including a newly proposed method based on GFlowNets. The results of our study demonstrate that some of the algorithms studied are able to effectively capture a wide range of useful and diverse ATE modes.
arXiv Detail & Related papers (2023-07-11T02:58:10Z)
Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z)
Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem. We examine the performance of various debiasing methods across multiple tasks. We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z)
Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods. We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness, and Semantic Evaluation [23.72825603188359]
We can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit. We propose a semantic-based evaluation metric that can better align with humans' judgment of explanations.
arXiv Detail & Related papers (2021-06-09T00:49:56Z)
An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples. We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z)
RelatIF: Identifying Explanatory Training Examples via Relative Influence [13.87851325824883]
We use influence functions to identify relevant training examples that one might hope "explain" the predictions of a machine learning model. We introduce RelatIF, a new class of criteria for choosing relevant training examples by way of an optimization objective that places a constraint on global influence. In empirical evaluations, we find that the examples returned by RelatIF are more intuitive when compared to those found using influence functions.
arXiv Detail & Related papers (2020-03-25T20:59:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.