Related papers: Gradient-based Analysis of NLP Models is Manipulable

Gradient-based Analysis of NLP Models is Manipulable

URL: http://arxiv.org/abs/2010.05419v1
Date: Mon, 12 Oct 2020 02:54:22 GMT
Title: Gradient-based Analysis of NLP Models is Manipulable
Authors: Junlin Wang, Jens Tuyls, Eric Wallace, Sameer Singh
Abstract summary: We demonstrate that the gradients of a model are easily manipulable, and thus bring into question the reliability of gradient-based analyses. In particular, we merge the layers of a target model with a Facade that overwhelms the gradients without affecting the predictions.
Score: 44.215057692679494
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient-based analysis methods, such as saliency map visualizations and adversarial input perturbations, have found widespread use in interpreting neural NLP models due to their simplicity, flexibility, and most importantly, their faithfulness. In this paper, however, we demonstrate that the gradients of a model are easily manipulable, and thus bring into question the reliability of gradient-based analyses. In particular, we merge the layers of a target model with a Facade that overwhelms the gradients without affecting the predictions. This Facade can be trained to have gradients that are misleading and irrelevant to the task, such as focusing only on the stop words in the input. On a variety of NLP tasks (text classification, NLI, and QA), we show that our method can manipulate numerous gradient-based analysis techniques: saliency maps, input reduction, and adversarial perturbations all identify unimportant or targeted tokens as being highly important. The code and a tutorial of this paper is available at http://ucinlp.github.io/facade.

Related papers

Revisiting Gradient-based Uncertainty for Monocular Depth Estimation [10.502852645001882]
We introduce gradient-based uncertainty estimation for monocular depth estimation models. We demonstrate that our approach is effective in determining the uncertainty without re-training. In particular, for models trained with monocular sequences and therefore most prone to uncertainty, our method outperforms related approaches.
arXiv Detail & Related papers (2025-02-09T17:21:41Z)
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI [59.96044730204345]
We introduce Derivative-Free Diffusion Manifold-Constrainted Gradients (FreeMCG) FreeMCG serves as an improved basis for explainability of a given neural network. We show that our method yields state-of-the-art results while preserving the essential properties expected of XAI tools.
arXiv Detail & Related papers (2024-11-22T11:15:14Z)
Unlearning-based Neural Interpretations [51.99182464831169]
We show that current baselines defined using static functions are biased, fragile and manipulable. We propose UNI to compute an (un)learnable, debiased and adaptive baseline by perturbing the input towards an unlearning direction of steepest ascent.
arXiv Detail & Related papers (2024-10-10T16:02:39Z)
Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift [24.49100064042827]
Estimating the test performance of a model without access to the ground-truth labels is a challenging problem.<n>We use the norm of classification-layer gradients, backpropagated from the cross-entropy loss after only one gradient step over test data.<n>Our intuition is that these gradients should be of higher magnitude when the model generalizes poorly.
arXiv Detail & Related papers (2024-01-17T01:33:23Z)
Probing the Purview of Neural Networks via Gradient Analysis [13.800680101300756]
We analyze the data-dependent capacity of neural networks and assess anomalies in inputs from the perspective of networks during inference. To probe the purview of a network, we utilize gradients to measure the amount of change required for the model to characterize the given inputs more accurately. We demonstrate that our gradient-based approach can effectively differentiate inputs that cannot be accurately represented with learned features.
arXiv Detail & Related papers (2023-04-06T03:02:05Z)
Tell Model Where to Attend: Improving Interpretability of Aspect-Based Sentiment Classification via Small Explanation Annotations [23.05672636220897]
We propose an textbfInterpretation-textbfEnhanced textbfGradient-based framework for textbfABSC via a small number of explanation annotations, namely textttIEGA. Our model is model agnostic and task agnostic so that it can be integrated into the existing ABSC methods or other tasks.
arXiv Detail & Related papers (2023-02-21T06:55:08Z)
Locally Aggregated Feature Attribution on Natural Language Model Understanding [12.233103741197334]
Locally Aggregated Feature Attribution (LAFA) is a novel gradient-based feature attribution method for NLP models. Instead of relying on obscure reference tokens, it smooths gradients by aggregating similar reference texts derived from language model embeddings. For evaluation purpose, we also design experiments on different NLP tasks including Entity Recognition and Sentiment Analysis on public datasets.
arXiv Detail & Related papers (2022-04-22T18:59:27Z)
Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders. Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector. We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z)
Revealing and Protecting Labels in Distributed Training [3.18475216176047]
We propose a method to discover the set of labels of training samples from only the gradient of the last layer and the id to label mapping. We demonstrate the effectiveness of our method for model training in two domains - image classification, and automatic speech recognition.
arXiv Detail & Related papers (2021-10-31T17:57:49Z)
Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems. We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z)
Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking [63.49779304362376]
Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models. We introduce a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges. We show that we can drop a large proportion of edges without deteriorating the performance of the model.
arXiv Detail & Related papers (2020-10-01T17:51:19Z)
Gradients as a Measure of Uncertainty in Neural Networks [16.80077149399317]
We propose to utilize backpropagated gradients to quantify the uncertainty of trained models. We show that our gradient-based method outperforms state-of-the-art methods by up to 4.8% of AUROC score in out-of-distribution detection.
arXiv Detail & Related papers (2020-08-18T16:58:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.