Can Rationalization Improve Robustness?
- URL: http://arxiv.org/abs/2204.11790v1
- Date: Mon, 25 Apr 2022 17:02:42 GMT
- Title: Can Rationalization Improve Robustness?
- Authors: Howard Chen, Jacqueline He, Karthik Narasimhan, Danqi Chen
- Abstract summary: We investigate whether neural NLP models can provide robustness to adversarial attacks in addition to their interpretable nature.
We generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks.
Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios.
- Score: 39.741059642044874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A growing line of work has investigated the development of neural NLP models
that can produce rationales--subsets of input that can explain their model
predictions. In this paper, we ask whether such rationale models can also
provide robustness to adversarial attacks in addition to their interpretable
nature. Since these models need to first generate rationales ("rationalizer")
before making predictions ("predictor"), they have the potential to ignore
noise or adversarially added text by simply masking it out of the generated
rationale. To this end, we systematically generate various types of 'AddText'
attacks for both token and sentence-level rationalization tasks, and perform an
extensive empirical evaluation of state-of-the-art rationale models across five
different tasks. Our experiments reveal that the rationale models show the
promise to improve robustness, while they struggle in certain scenarios--when
the rationalizer is sensitive to positional bias or lexical choices of attack
text. Further, leveraging human rationale as supervision does not always
translate to better performance. Our study is a first step towards exploring
the interplay between interpretability and robustness in the
rationalize-then-predict framework.
Related papers
- Adversarial Attack for Explanation Robustness of Rationalization Models [17.839644167949906]
Rationalization models select a subset of input text as rationale-crucial for humans to understand and trust predictions.
This paper aims to undermine the explainability of rationalization models without altering their predictions, thereby eliciting distrust in these models from human users.
arXiv Detail & Related papers (2024-08-20T12:43:58Z) - Characterizing Large Language Models as Rationalizers of
Knowledge-intensive Tasks [6.51301154858045]
Large language models (LLMs) are proficient at generating fluent text with minimal task-specific supervision.
We consider the task of generating knowledge-guided rationalization in natural language by using expert-written examples in a few-shot manner.
Surprisingly, crowd-workers preferred knowledge-grounded rationales over crowdsourced rationalizations, citing their factuality, sufficiency, and comprehensive refutations.
arXiv Detail & Related papers (2023-11-09T01:04:44Z) - Unsupervised Selective Rationalization with Noise Injection [7.17737088382948]
unsupervised selective rationalization produces rationales alongside predictions by chaining two jointly-trained components, a rationale generator and a predictor.
We introduce a novel training technique that effectively limits generation of implausible rationales by injecting noise between the generator and the predictor.
We achieve sizeable improvements in rationale plausibility and task accuracy over the state-of-the-art across a variety of tasks, including our new benchmark.
arXiv Detail & Related papers (2023-05-27T17:34:36Z) - NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
Artificial Adversaries? [61.58261351116679]
We introduce a two-stage adversarial example generation framework (NaturalAdversaries) for natural language understanding tasks.
It is adaptable to both black-box and white-box adversarial attacks based on the level of access to the model parameters.
Our results indicate these adversaries generalize across domains, and offer insights for future research on improving robustness of neural text classification models.
arXiv Detail & Related papers (2022-11-08T16:37:34Z) - Rationale-Augmented Ensembles in Language Models [53.45015291520658]
We reconsider rationale-augmented prompting for few-shot in-context learning.
We identify rationale sampling in the output space as the key component to robustly improve performance.
We demonstrate that rationale-augmented ensembles achieve more accurate and interpretable results than existing prompting approaches.
arXiv Detail & Related papers (2022-07-02T06:20:57Z) - Logically Consistent Adversarial Attacks for Soft Theorem Provers [110.17147570572939]
We propose a generative adversarial framework for probing and improving language models' reasoning capabilities.
Our framework successfully generates adversarial attacks and identifies global weaknesses.
In addition to effective probing, we show that training on the generated samples improves the target model's performance.
arXiv Detail & Related papers (2022-04-29T19:10:12Z) - Learning to Rationalize for Nonmonotonic Reasoning with Distant
Supervision [44.32874972577682]
We investigate the extent to which neural models can reason about natural language rationales that explain model predictions.
We use pre-trained language models, neural knowledge models, and distant supervision from related tasks.
Our model shows promises at generating post-hoc rationales explaining why an inference is more or less likely given the additional information.
arXiv Detail & Related papers (2020-12-14T23:50:20Z) - Measuring Association Between Labels and Free-Text Rationales [60.58672852655487]
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance.
We demonstrate that pipelines, existing models for faithful extractive rationalization on information-extraction style tasks, do not extend as reliably to "reasoning" tasks requiring free-text rationales.
We turn to models that jointly predict and rationalize, a class of widely used high-performance models for free-text rationalization whose faithfulness is not yet established.
arXiv Detail & Related papers (2020-10-24T03:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.