Related papers: Can Rationalization Improve Robustness?

Can Rationalization Improve Robustness?

URL: http://arxiv.org/abs/2204.11790v1
Date: Mon, 25 Apr 2022 17:02:42 GMT
Title: Can Rationalization Improve Robustness?
Authors: Howard Chen, Jacqueline He, Karthik Narasimhan, Danqi Chen
Abstract summary: We investigate whether neural NLP models can provide robustness to adversarial attacks in addition to their interpretable nature. We generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks. Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios.
Score: 39.741059642044874
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A growing line of work has investigated the development of neural NLP models that can produce rationales--subsets of input that can explain their model predictions. In this paper, we ask whether such rationale models can also provide robustness to adversarial attacks in addition to their interpretable nature. Since these models need to first generate rationales ("rationalizer") before making predictions ("predictor"), they have the potential to ignore noise or adversarially added text by simply masking it out of the generated rationale. To this end, we systematically generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks, and perform an extensive empirical evaluation of state-of-the-art rationale models across five different tasks. Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios--when the rationalizer is sensitive to positional bias or lexical choices of attack text. Further, leveraging human rationale as supervision does not always translate to better performance. Our study is a first step towards exploring the interplay between interpretability and robustness in the rationalize-then-predict framework.

Related papers

Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability [70.4107059502882]
Training language models with rationales augmentation has been shown to be beneficial in many existing works.<n>We conduct comprehensive investigations to thoroughly inspect the impact of rationales on model performance.
arXiv Detail & Related papers (2025-05-30T02:39:37Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
Adversarial Attack for Explanation Robustness of Rationalization Models [17.839644167949906]
Rationalization models select a subset of input text as rationale-crucial for humans to understand and trust predictions. This paper aims to undermine the explainability of rationalization models without altering their predictions, thereby eliciting distrust in these models from human users.
arXiv Detail & Related papers (2024-08-20T12:43:58Z)
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks [6.51301154858045]
Large language models (LLMs) are proficient at generating fluent text with minimal task-specific supervision. We consider the task of generating knowledge-guided rationalization in natural language by using expert-written examples in a few-shot manner. Surprisingly, crowd-workers preferred knowledge-grounded rationales over crowdsourced rationalizations, citing their factuality, sufficiency, and comprehensive refutations.
arXiv Detail & Related papers (2023-11-09T01:04:44Z)
Unsupervised Selective Rationalization with Noise Injection [7.17737088382948]
unsupervised selective rationalization produces rationales alongside predictions by chaining two jointly-trained components, a rationale generator and a predictor. We introduce a novel training technique that effectively limits generation of implausible rationales by injecting noise between the generator and the predictor. We achieve sizeable improvements in rationale plausibility and task accuracy over the state-of-the-art across a variety of tasks, including our new benchmark.
arXiv Detail & Related papers (2023-05-27T17:34:36Z)
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries? [61.58261351116679]
We introduce a two-stage adversarial example generation framework (NaturalAdversaries) for natural language understanding tasks. It is adaptable to both black-box and white-box adversarial attacks based on the level of access to the model parameters. Our results indicate these adversaries generalize across domains, and offer insights for future research on improving robustness of neural text classification models.
arXiv Detail & Related papers (2022-11-08T16:37:34Z)
Rationale-Augmented Ensembles in Language Models [53.45015291520658]
We reconsider rationale-augmented prompting for few-shot in-context learning. We identify rationale sampling in the output space as the key component to robustly improve performance. We demonstrate that rationale-augmented ensembles achieve more accurate and interpretable results than existing prompting approaches.
arXiv Detail & Related papers (2022-07-02T06:20:57Z)
Logically Consistent Adversarial Attacks for Soft Theorem Provers [110.17147570572939]
We propose a generative adversarial framework for probing and improving language models' reasoning capabilities. Our framework successfully generates adversarial attacks and identifies global weaknesses. In addition to effective probing, we show that training on the generated samples improves the target model's performance.
arXiv Detail & Related papers (2022-04-29T19:10:12Z)
Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision [44.32874972577682]
We investigate the extent to which neural models can reason about natural language rationales that explain model predictions. We use pre-trained language models, neural knowledge models, and distant supervision from related tasks. Our model shows promises at generating post-hoc rationales explaining why an inference is more or less likely given the additional information.
arXiv Detail & Related papers (2020-12-14T23:50:20Z)
Measuring Association Between Labels and Free-Text Rationales [60.58672852655487]
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance. We demonstrate that pipelines, existing models for faithful extractive rationalization on information-extraction style tasks, do not extend as reliably to "reasoning" tasks requiring free-text rationales. We turn to models that jointly predict and rationalize, a class of widely used high-performance models for free-text rationalization whose faithfulness is not yet established.
arXiv Detail & Related papers (2020-10-24T03:40:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.