Explanation-based Finetuning Makes Models More Robust to Spurious Cues
- URL: http://arxiv.org/abs/2305.04990v3
- Date: Tue, 6 Jun 2023 15:31:33 GMT
- Title: Explanation-based Finetuning Makes Models More Robust to Spurious Cues
- Authors: Josh Magnus Ludan, Yixuan Meng, Tai Nguyen, Saurabh Shah, Qing Lyu,
Marianna Apidianaki, Chris Callison-Burch
- Abstract summary: Large Language Models (LLMs) are so powerful that they sometimes learn correlations between labels and features that are irrelevant to the task.
We propose explanation-based finetuning as a general approach to mitigate LLMs' reliance on spurious correlations.
We finetune the model to additionally generate a free-text explanation supporting its answer.
- Score: 21.327036110196637
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) are so powerful that they sometimes learn
correlations between labels and features that are irrelevant to the task,
leading to poor generalization on out-of-distribution data. We propose
explanation-based finetuning as a general approach to mitigate LLMs' reliance
on spurious correlations. Unlike standard finetuning where the model only
predicts the answer given the input, we finetune the model to additionally
generate a free-text explanation supporting its answer. To evaluate our method,
we finetune the model on artificially constructed training sets containing
different types of spurious cues, and test it on a test set without these cues.
Compared to standard finetuning, our method makes GPT-3 (davinci) remarkably
more robust against spurious cues in terms of accuracy drop across four
classification tasks: ComVE (+1.2), CREAK (+9.1), e-SNLI (+15.4), and SBIC
(+6.5). The efficacy generalizes across multiple model families and scales,
with greater gains for larger models. Finally, our method also works well with
explanations generated by the model, implying its applicability to more
datasets without human-written explanations.
Related papers
- Take It Easy: Label-Adaptive Self-Rationalization for Fact Verification and Explanation Generation [15.94564349084642]
Self-rationalization method is typically used in natural language inference tasks.
We fine-tune a model to learn veracity prediction with annotated labels.
We generate synthetic explanations from three large language models.
arXiv Detail & Related papers (2024-10-05T02:19:49Z) - Model Editing with Canonical Examples [75.33218320106585]
We introduce model editing with canonical examples.
A canonical example is a simple instance of good behavior, e.g., The capital of Mauritius is Port Louis.
We propose sense finetuning, which selects and finetunes a few sense vectors for each canonical example.
arXiv Detail & Related papers (2024-02-09T03:08:12Z) - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces.
We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z) - Text Alignment Is An Efficient Unified Model for Massive NLP Tasks [24.069447197357164]
Next-word prediction is often not an efficient formulation for many NLP tasks.
We propose text alignment as an efficient unified model for a wide range of crucial tasks.
Our model delivers on par or even superior performance with much smaller model sizes.
arXiv Detail & Related papers (2023-07-06T02:28:31Z) - Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models.
Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z) - Evaluating the Impact of Model Scale for Compositional Generalization in
Semantic Parsing [38.770055054268965]
Recent work has shown considerable improvements on many NLP tasks from model scaling.
Fine-tuning generally has flat or negative scaling curves on out-of-distribution compositional generalization.
In-context learning has positive scaling curves, but is generally outperformed by much smaller fine-tuned models.
arXiv Detail & Related papers (2022-05-24T17:57:39Z) - Efficient Large Scale Language Modeling with Mixtures of Experts [61.45159383372181]
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation.
This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings.
arXiv Detail & Related papers (2021-12-20T17:05:11Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - Turning Tables: Generating Examples from Semi-structured Tables for
Endowing Language Models with Reasoning Skills [32.55545292360155]
We propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs.
We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills.
We show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model.
arXiv Detail & Related papers (2021-07-15T11:37:14Z) - Beat the AI: Investigating Adversarial Human Annotation for Reading
Comprehension [27.538957000237176]
Humans create questions adversarially, such that the model fails to answer them correctly.
We collect 36,000 samples with progressively stronger models in the annotation loop.
We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets.
We find that stronger models can still learn from datasets collected with substantially weaker models-in-the-loop.
arXiv Detail & Related papers (2020-02-02T00:22:55Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.