Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency
Methods
- URL: http://arxiv.org/abs/2211.08369v3
- Date: Thu, 11 May 2023 11:37:04 GMT
- Title: Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency
Methods
- Authors: Josip Juki\'c, Martin Tutek, Jan \v{S}najder
- Abstract summary: We show that saliency methods exhibit weak rank correlations even when applied to the same model instance.
Regularization techniques that increase faithfulness of attention explanations also increase agreement between saliency methods.
- Score: 0.15039745292757667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A popular approach to unveiling the black box of neural NLP models is to
leverage saliency methods, which assign scalar importance scores to each input
component. A common practice for evaluating whether an interpretability method
is faithful has been to use evaluation-by-agreement -- if multiple methods
agree on an explanation, its credibility increases. However, recent work has
found that saliency methods exhibit weak rank correlations even when applied to
the same model instance and advocated for the use of alternative diagnostic
methods. In our work, we demonstrate that rank correlation is not a good fit
for evaluating agreement and argue that Pearson-$r$ is a better-suited
alternative. We further show that regularization techniques that increase
faithfulness of attention explanations also increase agreement between saliency
methods. By connecting our findings to instance categories based on training
dynamics, we show that the agreement of saliency method explanations is very
low for easy-to-learn instances. Finally, we connect the improvement in
agreement across instance categories to local representation space statistics
of instances, paving the way for work on analyzing which intrinsic model
properties improve their predisposition to interpretability methods.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Comparing Explanation Methods for Traditional Machine Learning Models
Part 2: Quantifying Model Explainability Faithfulness and Improvements with
Dimensionality Reduction [0.0]
"faithfulness" or "fidelity" refer to the correspondence between the assigned feature importance and the contribution of the feature to model performance.
This study is one of the first to quantify the improvement in explainability from limiting correlated features and knowing the relative fidelity of different explainability methods.
arXiv Detail & Related papers (2022-11-18T17:15:59Z) - Differentiable Data Augmentation for Contrastive Sentence Representation
Learning [6.398022050054328]
The proposed method yields significant improvements over existing methods under both semi-supervised and supervised settings.
Our experiments under a low labeled data setting also show that our method is more label-efficient than the state-of-the-art contrastive learning methods.
arXiv Detail & Related papers (2022-10-29T08:57:45Z) - "Will You Find These Shortcuts?" A Protocol for Evaluating the
Faithfulness of Input Salience Methods for Text Classification [38.22453895596424]
We present a protocol for faithfulness evaluation that makes use of partially synthetic data to obtain ground truth for feature importance ranking.
We do an in-depth analysis of four standard salience method classes on a range of datasets and shortcuts for BERT and LSTM models.
We recommend following the protocol for each new task and model combination to find the best method for identifying shortcuts.
arXiv Detail & Related papers (2021-11-14T15:31:29Z) - Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions.
Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise.
We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z) - Direct Advantage Estimation [63.52264764099532]
We show that the expected return may depend on the policy in an undesirable way which could slow down learning.
We propose the Direct Advantage Estimation (DAE), a novel method that can model the advantage function and estimate it directly from data.
If desired, value functions can also be seamlessly integrated into DAE and be updated in a similar way to Temporal Difference Learning.
arXiv Detail & Related papers (2021-09-13T16:09:31Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - There and Back Again: Revisiting Backpropagation Saliency Methods [87.40330595283969]
Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample.
A popular class of such methods is based on backpropagating a signal and analyzing the resulting gradient.
We propose a single framework under which several such methods can be unified.
arXiv Detail & Related papers (2020-04-06T17:58:08Z) - An end-to-end approach for the verification problem: learning the right
distance [15.553424028461885]
We augment the metric learning setting by introducing a parametric pseudo-distance, trained jointly with the encoder.
We first show it approximates a likelihood ratio which can be used for hypothesis tests.
We observe training is much simplified under the proposed approach compared to metric learning with actual distances.
arXiv Detail & Related papers (2020-02-21T18:46:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.