Related papers: Unforgettable Generalization in Language Models

Unforgettable Generalization in Language Models

URL: http://arxiv.org/abs/2409.02228v1
Date: Tue, 3 Sep 2024 18:55:54 GMT
Title: Unforgettable Generalization in Language Models
Authors: Eric Zhang, Leshem Chosen, Jacob Andreas,
Abstract summary: We study the behavior of language models (LMs) in which tasks have been forgotten via fine-tuning on randomized labels. Across tasks, however, LMs exhibit extreme variability in whether LM predictions change on examples outside the training set.
Score: 46.98652406155007
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When language models (LMs) are trained to forget (or "unlearn'') a skill, how precisely does their behavior change? We study the behavior of transformer LMs in which tasks have been forgotten via fine-tuning on randomized labels. Such LMs learn to generate near-random predictions for individual examples in the "training'' set used for forgetting. Across tasks, however, LMs exhibit extreme variability in whether LM predictions change on examples outside the training set. In some tasks (like entailment classification), forgetting generalizes robustly, and causes models to produce uninformative predictions on new task instances; in other tasks (like physical commonsense reasoning and scientific question answering) forgetting affects only the training examples, and models continue to perform the "forgotten'' task accurately even for examples very similar to those that appeared in the training set. Dataset difficulty is not predictive of whether a behavior can be forgotten; instead, generalization in forgetting is (weakly) predicted by the confidence of LMs' initial task predictions and the variability of LM representations of training data, with low confidence and low variability both associated with greater generalization. Perhaps most surprisingly, random-label forgetting appears to be somewhat insensitive to the contents of the training set: for example, models trained on science questions with random labels continue to answer other science questions accurately, but begin to produce random labels on entailment classification tasks. Finally, we show that even generalizable forgetting is shallow: linear probes trained on LMs' representations can still perform tasks reliably after forgetting. Our results highlight the difficulty and unpredictability of performing targeted skill removal from models via fine-tuning.

Related papers

Robustness is Important: Limitations of LLMs for Data Fitting [0.0]
Large Language Models (LLMs) are being applied in a wide array of settings.<n>We identify a critical vulnerability of using LLMs for data fitting.<n>Changing variable names can sway the size of prediction error by as much as 82% in certain settings.
arXiv Detail & Related papers (2025-08-27T04:46:05Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models [9.340409961107955]
Transformer models have the remarkable ability to perform in-context learning (ICL) We study how effectively transformers can bridge between their pretraining data mixture. Our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases.
arXiv Detail & Related papers (2023-11-01T21:41:08Z)
Making Pre-trained Language Models both Task-solvers and Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems. Previous work shows that introducing an extra calibration task can mitigate this issue. We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z)
Inverse Scaling: When Bigger Isn't Better [80.42834197416444]
Large language models (LMs) show predictable improvements to overall loss with increased scale. We present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale.
arXiv Detail & Related papers (2023-06-15T20:11:23Z)
DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance. To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z)
CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance. Sample re-weighting methods are popularly used to alleviate this data bias issue. We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z)
On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance. We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.