Towards Faithful Explanations for Text Classification with Robustness
Improvement and Explanation Guided Training
- URL: http://arxiv.org/abs/2312.17591v1
- Date: Fri, 29 Dec 2023 13:07:07 GMT
- Title: Towards Faithful Explanations for Text Classification with Robustness
Improvement and Explanation Guided Training
- Authors: Dongfang Li, Baotian Hu, Qingcai Chen, Shan He
- Abstract summary: Feature attribution methods highlight the important input tokens as explanations to model predictions.
Recent works show that explanations provided by these methods face challenges of being faithful and robust.
We propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification.
- Score: 30.626080706755822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature attribution methods highlight the important input tokens as
explanations to model predictions, which have been widely applied to deep
neural networks towards trustworthy AI. However, recent works show that
explanations provided by these methods face challenges of being faithful and
robust. In this paper, we propose a method with Robustness improvement and
Explanation Guided training towards more faithful EXplanations (REGEX) for text
classification. First, we improve model robustness by input gradient
regularization technique and virtual adversarial training. Secondly, we use
salient ranking to mask noisy tokens and maximize the similarity between model
attention and feature attribution, which can be seen as a self-training
procedure without importing other external information. We conduct extensive
experiments on six datasets with five attribution methods, and also evaluate
the faithfulness in the out-of-domain setting. The results show that REGEX
improves fidelity metrics of explanations in all settings and further achieves
consistent gains based on two randomization tests. Moreover, we show that using
highlight explanations produced by REGEX to train select-then-predict models
results in comparable task performance to the end-to-end method.
Related papers
- Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Self-Supervised Dual Contouring [30.9409064656302]
We propose a self-supervised training scheme for the Neural Dual Contouring meshing framework.
We use two novel self-supervised loss functions that encourage consistency between distances to the generated mesh.
We demonstrate that our self-supervised losses improve meshing performance in the single-view reconstruction task.
arXiv Detail & Related papers (2024-05-28T12:44:28Z) - READ: Improving Relation Extraction from an ADversarial Perspective [33.44949503459933]
We propose an adversarial training method specifically designed for relation extraction (RE)
Our approach introduces both sequence- and token-level perturbations to the sample and uses a separate perturbation vocabulary to improve the search for entity and context perturbations.
arXiv Detail & Related papers (2024-04-02T16:42:44Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - An Empirical Study on Explanations in Out-of-Domain Settings [35.07805573291534]
We study how post-hoc explanations and inherently faithful models perform in out-of-domain settings.
Results show that in many cases out-of-domain post-hoc explanation faithfulness measured by sufficiency and comprehensiveness is higher compared to in-domain.
Our findings also show that select-then predict models demonstrate comparable predictive performance in out-of-domain settings to full-text trained models.
arXiv Detail & Related papers (2022-02-28T19:50:23Z) - Enjoy the Salience: Towards Better Transformer-based Faithful
Explanations with Word Salience [9.147707153504117]
We propose an auxiliary loss function for guiding the multi-head attention mechanism during training to be close to salient information extracted using TextRank.
Experiments for explanation faithfulness across five datasets, show that models trained with SaLoss consistently provide more faithful explanations.
We further show that the latter result in higher predictive performance in downstream tasks.
arXiv Detail & Related papers (2021-08-31T11:21:30Z) - Self-supervised Co-training for Video Representation Learning [103.69904379356413]
We investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation training.
We propose a novel self-supervised co-training scheme to improve the popular infoNCE loss.
We evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval.
arXiv Detail & Related papers (2020-10-19T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.