On the generalization of language models from in-context learning and finetuning: a controlled study
- URL: http://arxiv.org/abs/2505.00661v1
- Date: Thu, 01 May 2025 17:02:27 GMT
- Title: On the generalization of language models from in-context learning and finetuning: a controlled study
- Authors: Andrew K. Lampinen, Arslan Chaudhry, Stephanie C. Y. Chan, Cody Wild, Diane Wan, Alex Ku, Jörg Bornschein, Razvan Pascanu, Murray Shanahan, James L. McClelland,
- Abstract summary: We show that language models' in-context learning shows different inductive biases, and can generalize better in some cases.<n>We propose a method to enable improved generalization from fine-tuning: adding in-context inferences to finetuning data.<n>Our results have implications for understanding the inductive biases of different modes of learning in language models.
- Score: 36.384796130439035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models exhibit exciting capabilities, yet can show surprisingly narrow generalization from finetuning -- from failing to generalize to simple reversals of relations they are trained on, to missing logical deductions that can be made from trained information. These failures to generalize from fine-tuning can hinder practical application of these models. However, language models' in-context learning shows different inductive biases, and can generalize better in some of these cases. Here, we explore these differences in generalization between in-context- and fine-tuning-based learning. To do so, we constructed several novel datasets to evaluate and improve models' ability to generalize from finetuning data. The datasets are constructed to isolate the knowledge in the dataset from that in pretraining, to create clean tests of generalization. We expose pretrained large models to controlled subsets of the information in these datasets -- either in context, or through fine-tuning -- and evaluate their performance on test sets that require various types of generalization. We find overall that in data-matched settings, in-context learning can generalize more flexibly than fine-tuning (though we also find some qualifications of prior findings, such as cases when fine-tuning can generalize to reversals embedded in a larger structure of knowledge). We build on these findings to propose a method to enable improved generalization from fine-tuning: adding in-context inferences to finetuning data. We show that this method improves generalization across various splits of our datasets and other benchmarks. Our results have implications for understanding the inductive biases of different modes of learning in language models, and practically improving their performance.
Related papers
- Context-Parametric Inversion: Why Instruction Finetuning Can Worsen Context Reliance [68.56701216210617]
In-principle, one would expect models to adapt to the user context better after instruction finetuning.<n>We observe a surprising failure mode: during instruction tuning, the context reliance under knowledge conflicts initially increases as expected, but then gradually decreases.
arXiv Detail & Related papers (2024-10-14T17:57:09Z) - UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing [19.2764682793582]
We show the inherent misalignment between pre-training and downstream tuning objectives in language models for probing knowledge.
We propose an adapter-based framework, UniArk, for generalised and consistent factual knowledge extraction.
arXiv Detail & Related papers (2024-04-01T17:22:07Z) - Towards Understanding the Relationship between In-context Learning and Compositional Generalization [7.843029855730508]
We train a causal Transformer in a setting that renders ordinary learning very difficult.
The model can solve the task, however, by utilizing earlier examples to generalize to later ones.
In evaluations on the datasets, SCAN, COGS, and GeoQuery, models trained in this manner indeed show improved compositional generalization.
arXiv Detail & Related papers (2024-03-18T14:45:52Z) - Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs)
Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks.
We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z) - Data Factors for Better Compositional Generalization [60.698130703909804]
We conduct an empirical analysis by training Transformer models on a variety of training sets with different data factors.
We show that increased dataset complexity can lead to better generalization behavior on multiple different generalization challenges.
We explore how training examples of different difficulty levels influence generalization differently.
arXiv Detail & Related papers (2023-11-08T01:27:34Z) - Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and
Evaluation [35.72916406365469]
We compare the generalization of few-shot fine-tuning and in-context learning to challenge datasets.
Our results show that fine-tuned language models can in fact generalize well out-of-domain.
arXiv Detail & Related papers (2023-05-26T13:55:17Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Cross-Domain Generalization and Knowledge Transfer in Transformers
Trained on Legal Data [0.0]
We analyze the ability of pre-trained language models to transfer knowledge among datasets annotated with different type systems.
Prediction of the rhetorical role a sentence plays in a case decision is an important and often studied task in AI & Law.
arXiv Detail & Related papers (2021-12-15T04:23:14Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.