Enjoy the Salience: Towards Better Transformer-based Faithful
Explanations with Word Salience
- URL: http://arxiv.org/abs/2108.13759v1
- Date: Tue, 31 Aug 2021 11:21:30 GMT
- Title: Enjoy the Salience: Towards Better Transformer-based Faithful
Explanations with Word Salience
- Authors: George Chrysostomou and Nikolaos Aletras
- Abstract summary: We propose an auxiliary loss function for guiding the multi-head attention mechanism during training to be close to salient information extracted using TextRank.
Experiments for explanation faithfulness across five datasets, show that models trained with SaLoss consistently provide more faithful explanations.
We further show that the latter result in higher predictive performance in downstream tasks.
- Score: 9.147707153504117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained transformer-based models such as BERT have demonstrated
state-of-the-art predictive performance when adapted into a range of natural
language processing tasks. An open problem is how to improve the faithfulness
of explanations (rationales) for the predictions of these models. In this
paper, we hypothesize that salient information extracted a priori from the
training data can complement the task-specific information learned by the model
during fine-tuning on a downstream task. In this way, we aim to help BERT not
to forget assigning importance to informative input tokens when making
predictions by proposing SaLoss; an auxiliary loss function for guiding the
multi-head attention mechanism during training to be close to salient
information extracted a priori using TextRank. Experiments for explanation
faithfulness across five datasets, show that models trained with SaLoss
consistently provide more faithful explanations across four different feature
attribution methods compared to vanilla BERT. Using the rationales extracted
from vanilla BERT and SaLoss models to train inherently faithful classifiers,
we further show that the latter result in higher predictive performance in
downstream tasks.
Related papers
- Towards Faithful Explanations for Text Classification with Robustness
Improvement and Explanation Guided Training [30.626080706755822]
Feature attribution methods highlight the important input tokens as explanations to model predictions.
Recent works show that explanations provided by these methods face challenges of being faithful and robust.
We propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification.
arXiv Detail & Related papers (2023-12-29T13:07:07Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Harnessing the Power of Explanations for Incremental Training: A
LIME-Based Approach [6.244905619201076]
In this work, model explanations are fed back to the feed-forward training to help the model generalize better.
The framework incorporates the custom weighted loss with Elastic Weight Consolidation (EWC) to maintain performance in sequential testing sets.
The proposed custom training procedure results in a consistent enhancement of accuracy ranging from 0.5% to 1.5% throughout all phases of the incremental learning setup.
arXiv Detail & Related papers (2022-11-02T18:16:17Z) - Improving the Adversarial Robustness of NLP Models by Information
Bottleneck [112.44039792098579]
Non-robust features can be easily manipulated by adversaries to fool NLP models.
In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory.
We show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy.
arXiv Detail & Related papers (2022-06-11T12:12:20Z) - BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input
Representation [92.75908003533736]
We propose a framework-level robust sequence-to-sequence learning approach, named BLISS, via self-supervised input representation.
We conduct comprehensive experiments to validate the effectiveness of BLISS on various tasks, including machine translation, grammatical error correction, and text summarization.
arXiv Detail & Related papers (2022-04-16T16:19:47Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Explain and Predict, and then Predict Again [6.865156063241553]
We propose ExPred, that uses multi-task learning in the explanation generation phase effectively trading-off explanation and prediction losses.
We conduct an extensive evaluation of our approach on three diverse language datasets.
arXiv Detail & Related papers (2021-01-11T19:36:52Z) - Inserting Information Bottlenecks for Attribution in Transformers [46.77580577396633]
We apply information bottlenecks to analyze the attribution of each feature for prediction on a black-box model.
We show the effectiveness of our method in terms of attribution and the ability to provide insight into how information flows through layers.
arXiv Detail & Related papers (2020-12-27T00:35:43Z) - Self-Attention Attribution: Interpreting Information Interactions Inside
Transformer [89.21584915290319]
We propose a self-attention attribution method to interpret the information interactions inside Transformer.
We show that the attribution results can be used as adversarial patterns to implement non-targeted attacks towards BERT.
arXiv Detail & Related papers (2020-04-23T14:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.