From Superficial Patterns to Semantic Understanding: Fine-Tuning Language Models on Contrast Sets
- URL: http://arxiv.org/abs/2501.02683v2
- Date: Wed, 08 Jan 2025 01:27:30 GMT
- Title: From Superficial Patterns to Semantic Understanding: Fine-Tuning Language Models on Contrast Sets
- Authors: Daniel Petrov,
- Abstract summary: This study explores how the robustness of a language model can be improved by exposing it to small amounts of more complex contrast sets during training.
With this approach, the model recovers performance and achieves nearly 90% accuracy on contrast sets, highlighting the importance of diverse and challenging training data.
- Score: 0.21756081703275998
- License:
- Abstract: Large-scale pre-trained language models have demonstrated high performance on standard datasets for natural language inference (NLI) tasks. Unfortunately, these evaluations can be misleading, as although the models can perform well on in-distribution data, they perform poorly on out-of-distribution test sets, such as contrast sets. Contrast sets consist of perturbed instances of data that have very minor, but meaningful, changes to the input that alter the gold label, revealing how models can learn superficial patterns in the training data rather than learning more sophisticated language nuances. As an example, the ELECTRA-small language model achieves nearly 90% accuracy on an SNLI dataset but drops to 75% when tested on an out-of-distribution contrast set. The research carried out in this study explores how the robustness of a language model can be improved by exposing it to small amounts of more complex contrast sets during training to help it better learn language patterns. With this approach, the model recovers performance and achieves nearly 90% accuracy on contrast sets, highlighting the importance of diverse and challenging training data.
Related papers
- Teaching a Language Model to Distinguish Between Similar Details using a Small Adversarial Training Set [0.0]
We show an increase in accuracy on the adversarial test set (+ 13%) while still maintaining good performance on the original NLI task.
We also show an increase in accuracy from 91.2% to 92.9% on the most similar contradictions in the SNLI test set (as judged by cosine similarity)
arXiv Detail & Related papers (2024-10-30T15:27:55Z) - How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - Evaluating Large Language Models Using Contrast Sets: An Experimental Approach [0.0]
We introduce an innovative technique for generating a contrast set for the Stanford Natural Language Inference dataset.
Our strategy involves the automated substitution of verbs, adverbs, and adjectives with their synonyms to preserve the original meaning of sentences.
This method aims to assess whether a model's performance is based on genuine language comprehension or simply on pattern recognition.
arXiv Detail & Related papers (2024-04-02T02:03:28Z) - Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains.
How do language models of different sizes learn during pre-training?
Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Linguistically-Informed Transformations (LIT): A Method for
Automatically Generating Contrast Sets [13.706520309917634]
We propose a Linguistically-Informed Transformation (LIT) method to automatically generate contrast sets.
Experiments show that current pretrained language models struggle on our automatically generated contrast sets.
We improve models' performance on the contrast sets by apply-ing LIT to augment the training data, without affecting performance on the original data.
arXiv Detail & Related papers (2020-10-16T18:23:05Z) - Evaluating Models' Local Decision Boundaries via Contrast Sets [119.38387782979474]
We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data.
We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets.
Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets.
arXiv Detail & Related papers (2020-04-06T14:47:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.