Related papers: Does Self-Rationalization Improve Robustness to Spurious Correlations?

Does Self-Rationalization Improve Robustness to Spurious Correlations?

URL: http://arxiv.org/abs/2210.13575v1
Date: Mon, 24 Oct 2022 19:54:57 GMT
Title: Does Self-Rationalization Improve Robustness to Spurious Correlations?
Authors: Alexis Ross, Matthew E. Peters, Ana Marasovi\'c
Abstract summary: We ask whether training models to self-rationalize can aid in their learning to solve tasks for the right reasons. We evaluate robustness to spurious correlations in fine-tuned encoder-decoder and decoder-only models of six different sizes. We find that while self-rationalization can improve robustness to spurious correlations in low-resource settings, it tends to hurt robustness in higher-resource settings.
Score: 19.553357015260687
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rationalization is fundamental to human reasoning and learning. NLP models trained to produce rationales along with predictions, called self-rationalization models, have been investigated for their interpretability and utility to end-users. However, the extent to which training with human-written rationales facilitates learning remains an under-explored question. We ask whether training models to self-rationalize can aid in their learning to solve tasks for the right reasons. Specifically, we evaluate how training self-rationalization models with free-text rationales affects robustness to spurious correlations in fine-tuned encoder-decoder and decoder-only models of six different sizes. We evaluate robustness to spurious correlations by measuring performance on 1) manually annotated challenge datasets and 2) subsets of original test sets where reliance on spurious correlations would fail to produce correct answers. We find that while self-rationalization can improve robustness to spurious correlations in low-resource settings, it tends to hurt robustness in higher-resource settings. Furthermore, these effects depend on model family and size, as well as on rationale content. Together, our results suggest that explainability can come at the cost of robustness; thus, appropriate care should be taken when training self-rationalizing models with the goal of creating more trustworthy models.

Related papers

Self-rewarding correction for mathematical reasoning [19.480508580498103]
We study self-rewarding reasoning large language models (LLMs) LLMs can simultaneously generate step-by-step reasoning and evaluate the correctness of their outputs during the inference time-without external feedback. We propose a two-staged algorithmic framework for constructing self-rewarding reasoning models using only self-generated data.
arXiv Detail & Related papers (2025-02-26T23:01:16Z)
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference. Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z)
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z)
Reviving Dormant Memories: Investigating Catastrophic Forgetting in Language Models through Rationale-Guidance Difficulty [7.5795085006788545]
We find that when a forgetting model passively receives an externally provided rationale, its performance on the forgotten task can be restored. We propose the Rationale-Guidance Difficulty metric to evaluate how effectively a given instruction guides the model in generating appropriate rationales.
arXiv Detail & Related papers (2024-11-18T14:28:04Z)
Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation [15.124701883286436]
Self-training approach for large language models (LLMs) improves reasoning abilities by training the models on their self-generated rationales. Previous approaches have labeled rationales that produce correct answers for a given question as appropriate for training. We propose CREST (Consistency-driven Rationale Evaluation for Self-Training), a self-training framework that further evaluates each rationale through follow-up questions.
arXiv Detail & Related papers (2024-11-10T08:11:05Z)
Improving Language Model Reasoning with Self-motivated Learning [60.779625789039486]
textitSelf-motivated Learning framework motivates the model itself to automatically generate rationales on existing datasets. We train a reward model with the rank to evaluate the quality of rationales, and improve the performance of reasoning through reinforcement learning.
arXiv Detail & Related papers (2024-04-10T14:05:44Z)
Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation [57.351098530477124]
We consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders. A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one. Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge.
arXiv Detail & Related papers (2023-07-15T23:53:37Z)
Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work. Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z)
Explicit Tradeoffs between Adversarial and Natural Distributional Robustness [48.44639585732391]
In practice, models need to enjoy both types of robustness to ensure reliability. In this work, we show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness.
arXiv Detail & Related papers (2022-09-15T19:58:01Z)
Can Rationalization Improve Robustness? [39.741059642044874]
We investigate whether neural NLP models can provide robustness to adversarial attacks in addition to their interpretable nature. We generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks. Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios.
arXiv Detail & Related papers (2022-04-25T17:02:42Z)
Measuring Association Between Labels and Free-Text Rationales [60.58672852655487]
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance. We demonstrate that pipelines, existing models for faithful extractive rationalization on information-extraction style tasks, do not extend as reliably to "reasoning" tasks requiring free-text rationales. We turn to models that jointly predict and rationalize, a class of widely used high-performance models for free-text rationalization whose faithfulness is not yet established.
arXiv Detail & Related papers (2020-10-24T03:40:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.