KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating
Inconsistencies in Natural Language Explanations
- URL: http://arxiv.org/abs/2306.02980v1
- Date: Mon, 5 Jun 2023 15:51:58 GMT
- Title: KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating
Inconsistencies in Natural Language Explanations
- Authors: Myeongjun Jang, Bodhisattwa Prasad Majumder, Julian McAuley, Thomas
Lukasiewicz, Oana-Maria Camburu
- Abstract summary: We leverage external knowledge bases to significantly improve on an existing adversarial attack for detecting inconsistent NLEs.
We show that models with higher NLE quality do not necessarily generate fewer inconsistencies.
- Score: 52.33256203018764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While recent works have been considerably improving the quality of the
natural language explanations (NLEs) generated by a model to justify its
predictions, there is very limited research in detecting and alleviating
inconsistencies among generated NLEs. In this work, we leverage external
knowledge bases to significantly improve on an existing adversarial attack for
detecting inconsistent NLEs. We apply our attack to high-performing NLE models
and show that models with higher NLE quality do not necessarily generate fewer
inconsistencies. Moreover, we propose an off-the-shelf mitigation method to
alleviate inconsistencies by grounding the model into external background
knowledge. Our method decreases the inconsistencies of previous high-performing
NLE models as detected by our attack.
Related papers
- Enhancing adversarial robustness in Natural Language Inference using explanations [41.46494686136601]
We cast the spotlight on the underexplored task of Natural Language Inference (NLI)
We validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation.
We research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models.
arXiv Detail & Related papers (2024-09-11T17:09:49Z) - Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models [96.43562963756975]
We train a regression model, which target variable is the gap between the conditional and the unconditional generation confidence.
We use this learned conditional dependency model to modulate the uncertainty of the current generation step based on the uncertainty of the previous step.
arXiv Detail & Related papers (2024-08-20T09:42:26Z) - Don't Miss Out on Novelty: Importance of Novel Features for Deep Anomaly
Detection [64.21963650519312]
Anomaly Detection (AD) is a critical task that involves identifying observations that do not conform to a learned model of normality.
We propose a novel approach to AD using explainability to capture such novel features as unexplained observations in the input space.
Our approach establishes a new state-of-the-art across multiple benchmarks, handling diverse anomaly types.
arXiv Detail & Related papers (2023-10-01T21:24:05Z) - Faithfulness Tests for Natural Language Explanations [87.01093277918599]
Explanations of neural models aim to reveal a model's decision-making process for its predictions.
Recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading.
This work explores the challenging question of evaluating the faithfulness of natural language explanations.
arXiv Detail & Related papers (2023-05-29T11:40:37Z) - Benchmarking Faithfulness: Towards Accurate Natural Language
Explanations in Vision-Language Tasks [0.0]
Natural language explanations (NLEs) promise to enable the communication of a model's decision-making in an easily intelligible way.
While current models successfully generate convincing explanations, it is an open question how well the NLEs actually represent the reasoning process of the models.
We propose three faithfulness metrics: Attribution-Similarity, NLE-Sufficiency, and NLE-Comprehensiveness.
arXiv Detail & Related papers (2023-04-03T08:24:10Z) - Improving the Adversarial Robustness of NLP Models by Information
Bottleneck [112.44039792098579]
Non-robust features can be easily manipulated by adversaries to fool NLP models.
In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory.
We show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy.
arXiv Detail & Related papers (2022-06-11T12:12:20Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Model Explainability in Deep Learning Based Natural Language Processing [0.0]
We reviewed and compared some popular machine learning model explainability methodologies.
We applied one of the NLP explainability methods to a NLP classification model.
We identified some common issues due to the special natures of NLP models.
arXiv Detail & Related papers (2021-06-14T13:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.