Does Self-Rationalization Improve Robustness to Spurious Correlations?
- URL: http://arxiv.org/abs/2210.13575v1
- Date: Mon, 24 Oct 2022 19:54:57 GMT
- Title: Does Self-Rationalization Improve Robustness to Spurious Correlations?
- Authors: Alexis Ross, Matthew E. Peters, Ana Marasovi\'c
- Abstract summary: We ask whether training models to self-rationalize can aid in their learning to solve tasks for the right reasons.
We evaluate robustness to spurious correlations in fine-tuned encoder-decoder and decoder-only models of six different sizes.
We find that while self-rationalization can improve robustness to spurious correlations in low-resource settings, it tends to hurt robustness in higher-resource settings.
- Score: 19.553357015260687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rationalization is fundamental to human reasoning and learning. NLP models
trained to produce rationales along with predictions, called
self-rationalization models, have been investigated for their interpretability
and utility to end-users. However, the extent to which training with
human-written rationales facilitates learning remains an under-explored
question. We ask whether training models to self-rationalize can aid in their
learning to solve tasks for the right reasons. Specifically, we evaluate how
training self-rationalization models with free-text rationales affects
robustness to spurious correlations in fine-tuned encoder-decoder and
decoder-only models of six different sizes. We evaluate robustness to spurious
correlations by measuring performance on 1) manually annotated challenge
datasets and 2) subsets of original test sets where reliance on spurious
correlations would fail to produce correct answers. We find that while
self-rationalization can improve robustness to spurious correlations in
low-resource settings, it tends to hurt robustness in higher-resource settings.
Furthermore, these effects depend on model family and size, as well as on
rationale content. Together, our results suggest that explainability can come
at the cost of robustness; thus, appropriate care should be taken when training
self-rationalizing models with the goal of creating more trustworthy models.
Related papers
- Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models.
We first propose AutoMathCritique, an automated and scalable framework for collecting critique data.
We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z) - Reviving Dormant Memories: Investigating Catastrophic Forgetting in Language Models through Rationale-Guidance Difficulty [7.5795085006788545]
We find that when a forgetting model passively receives an externally provided rationale, its performance on the forgotten task can be restored.
We propose the Rationale-Guidance Difficulty metric to evaluate how effectively a given instruction guides the model in generating appropriate rationales.
arXiv Detail & Related papers (2024-11-18T14:28:04Z) - Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation [15.124701883286436]
Self-training approach for large language models (LLMs) improves reasoning abilities by training the models on their self-generated rationales.
Previous approaches have labeled rationales that produce correct answers for a given question as appropriate for training.
We propose CREST (Consistency-driven Rationale Evaluation for Self-Training), a self-training framework that further evaluates each rationale through follow-up questions.
arXiv Detail & Related papers (2024-11-10T08:11:05Z) - Improving Language Model Reasoning with Self-motivated Learning [60.779625789039486]
textitSelf-motivated Learning framework motivates the model itself to automatically generate rationales on existing datasets.
We train a reward model with the rank to evaluate the quality of rationales, and improve the performance of reasoning through reinforcement learning.
arXiv Detail & Related papers (2024-04-10T14:05:44Z) - Seeing is not Believing: Robust Reinforcement Learning against Spurious
Correlation [57.351098530477124]
We consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders.
A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one.
Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge.
arXiv Detail & Related papers (2023-07-15T23:53:37Z) - Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work.
Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z) - Explicit Tradeoffs between Adversarial and Natural Distributional
Robustness [48.44639585732391]
In practice, models need to enjoy both types of robustness to ensure reliability.
In this work, we show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness.
arXiv Detail & Related papers (2022-09-15T19:58:01Z) - Can Rationalization Improve Robustness? [39.741059642044874]
We investigate whether neural NLP models can provide robustness to adversarial attacks in addition to their interpretable nature.
We generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks.
Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios.
arXiv Detail & Related papers (2022-04-25T17:02:42Z) - Measuring Association Between Labels and Free-Text Rationales [60.58672852655487]
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance.
We demonstrate that pipelines, existing models for faithful extractive rationalization on information-extraction style tasks, do not extend as reliably to "reasoning" tasks requiring free-text rationales.
We turn to models that jointly predict and rationalize, a class of widely used high-performance models for free-text rationalization whose faithfulness is not yet established.
arXiv Detail & Related papers (2020-10-24T03:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.