The CRINGE Loss: Learning what language not to model
- URL: http://arxiv.org/abs/2211.05826v1
- Date: Thu, 10 Nov 2022 19:30:08 GMT
- Title: The CRINGE Loss: Learning what language not to model
- Authors: Leonard Adolphs, Tianyu Gao, Jing Xu, Kurt Shuster, Sainbayar
Sukhbaatar, Jason Weston
- Abstract summary: We show that even with large amounts of positive training data, issues remain that can be alleviated with relatively small amounts of negative data.
We propose a novel procedure to train with such data called the CRINGE loss (ContRastive Iterative Negative GEneration)
Our models outperform multiple strong baselines and are conceptually simple, easy to train and implement.
- Score: 35.40992193113732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard language model training employs gold human documents or human-human
interaction data, and treats all training data as positive examples. Growing
evidence shows that even with very large amounts of positive training data,
issues remain that can be alleviated with relatively small amounts of negative
data -- examples of what the model should not do. In this work, we propose a
novel procedure to train with such data called the CRINGE loss (ContRastive
Iterative Negative GEneration). We show the effectiveness of this approach
across three different experiments on the tasks of safe generation,
contradiction avoidance, and open-domain dialogue. Our models outperform
multiple strong baselines and are conceptually simple, easy to train and
implement.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems [17.10762463903638]
We train evaluation models to approximate human evaluation, achieving high agreement.
We propose a weak-to-strong supervision method that uses a fraction of the annotated data to train an evaluation model.
arXiv Detail & Related papers (2024-06-26T10:48:14Z) - Unlearning Traces the Influential Training Data of Language Models [31.33791825286853]
This paper presents UnTrac: unlearning traces the influence of a training dataset on the model's performance.
We propose a more scalable approach, UnTrac-Inv, which unlearns a test dataset and evaluates the unlearned model on training datasets.
arXiv Detail & Related papers (2024-01-26T23:17:31Z) - Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - Constructive Large Language Models Alignment with Diverse Feedback [76.9578950893839]
We introduce Constructive and Diverse Feedback (CDF) as a novel method to enhance large language models alignment.
We exploit critique feedback for easy problems, refinement feedback for medium problems, and preference feedback for hard problems.
By training our model with this diversified feedback, we achieve enhanced alignment performance while using less training data.
arXiv Detail & Related papers (2023-10-10T09:20:14Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - Robust Task-Oriented Dialogue Generation with Contrastive Pre-training
and Adversarial Filtering [17.7709632238066]
Data artifacts incentivize machine learning models to learn non-transferable generalizations.
We investigate whether popular datasets such as MultiWOZ contain such data artifacts.
We propose a contrastive learning based framework to encourage the model to ignore these cues and focus on learning generalisable patterns.
arXiv Detail & Related papers (2022-05-20T03:13:02Z) - On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training [70.82725772926949]
Adversarial training is a popular method to robustify models against adversarial attacks.
In this work, we investigate this phenomenon from the perspective of training instances.
We show that the decay in generalization performance of adversarial training is a result of fitting hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z) - Towards Zero-Label Language Learning [20.28186484098947]
This paper explores zero-label learning in Natural Language Processing (NLP)
No human-annotated data is used anywhere during training and models are trained purely on synthetic data.
Inspired by the recent success of few-shot inference on GPT-3, we present a training data creation procedure named Unsupervised Data Generation.
arXiv Detail & Related papers (2021-09-19T19:00:07Z) - Training Data Leakage Analysis in Language Models [6.843491191969066]
We introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model.
We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data.
arXiv Detail & Related papers (2021-01-14T00:57:32Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.