Related papers: Towards Understanding the Relationship between In-context Learning and Compositional Generalization

Towards Understanding the Relationship between In-context Learning and Compositional Generalization

URL: http://arxiv.org/abs/2403.11834v1
Date: Mon, 18 Mar 2024 14:45:52 GMT
Title: Towards Understanding the Relationship between In-context Learning and Compositional Generalization
Authors: Sungjun Han, Sebastian Padó,
Abstract summary: We train a causal Transformer in a setting that renders ordinary learning very difficult. The model can solve the task, however, by utilizing earlier examples to generalize to later ones. In evaluations on the datasets, SCAN, COGS, and GeoQuery, models trained in this manner indeed show improved compositional generalization.
Score: 7.843029855730508
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: According to the principle of compositional generalization, the meaning of a complex expression can be understood as a function of the meaning of its parts and of how they are combined. This principle is crucial for human language processing and also, arguably, for NLP models in the face of out-of-distribution data. However, many neural network models, including Transformers, have been shown to struggle with compositional generalization. In this paper, we hypothesize that forcing models to in-context learn can provide an inductive bias to promote compositional generalization. To test this hypothesis, we train a causal Transformer in a setting that renders ordinary learning very difficult: we present it with different orderings of the training instance and shuffle instance labels. This corresponds to training the model on all possible few-shot learning problems attainable from the dataset. The model can solve the task, however, by utilizing earlier examples to generalize to later ones (i.e. in-context learning). In evaluations on the datasets, SCAN, COGS, and GeoQuery, models trained in this manner indeed show improved compositional generalization. This indicates the usefulness of in-context learning problems as an inductive bias for generalization.

Related papers

The Coverage Principle: A Framework for Understanding Compositional Generalization [31.762330857169914]
We show that models relying primarily on pattern matching for compositional tasks cannot reliably generalize beyond substituting fragments that yield identical results when used in the same contexts.<n>We demonstrate that this framework has a strong predictive power for the generalization capabilities of Transformers.
arXiv Detail & Related papers (2025-05-26T17:55:15Z)
On the generalization of language models from in-context learning and finetuning: a controlled study [36.384796130439035]
We show that language models' in-context learning shows different inductive biases, and can generalize better in some cases. We propose a method to enable improved generalization from fine-tuning: adding in-context inferences to finetuning data. Our results have implications for understanding the inductive biases of different modes of learning in language models.
arXiv Detail & Related papers (2025-05-01T17:02:27Z)
In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z)
Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data. We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z)
When does compositional structure yield compositional generalization? A kernel theory [0.0]
We present a theory of compositional generalization in kernel models with fixed representations. We identify novel failure modes in compositional generalization that arise from biases in the training data. This work provides a theoretical perspective on how statistical structure in the training data can affect compositional generalization.
arXiv Detail & Related papers (2024-05-26T00:50:11Z)
Data Factors for Better Compositional Generalization [60.698130703909804]
We conduct an empirical analysis by training Transformer models on a variety of training sets with different data factors. We show that increased dataset complexity can lead to better generalization behavior on multiple different generalization challenges. We explore how training examples of different difficulty levels influence generalization differently.
arXiv Detail & Related papers (2023-11-08T01:27:34Z)
How Do In-Context Examples Affect Compositional Generalization? [86.57079616209474]
In this paper, we present CoFe, a test suite to investigate in-context compositional generalization. We find that the compositional generalization performance can be easily affected by the selection of in-context examples. Our systematic experiments indicate that in-context examples should be structurally similar to the test case, diverse from each other, and individually simple.
arXiv Detail & Related papers (2023-05-08T16:32:18Z)
Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language [48.37815764394315]
We study three unsupervised representation learning algorithms on two datasets that allow directly testing compositional generalization. We find that directly using the bottleneck representation with simple models and few labels may lead to worse generalization than using representations from layers before or after the learned representation itself. Surprisingly, we find that increasing pressure to produce a disentangled representation produces representations with worse generalization, while representations from EL models show strong compositional generalization.
arXiv Detail & Related papers (2022-10-02T10:35:53Z)
Learning to Generalize Compositionally by Transferring Across Semantic Parsing Tasks [37.66114618645146]
We investigate learning representations that facilitate transfer learning from one compositional task to another. We apply this method to semantic parsing, using three very different datasets. Our method significantly improves compositional generalization over baselines on the test set of the target task.
arXiv Detail & Related papers (2021-11-09T09:10:21Z)
Grounded Graph Decoding Improves Compositional Generalization in Question Answering [68.72605660152101]
Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures. We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism. Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
arXiv Detail & Related papers (2021-11-05T17:50:14Z)
Meta-Learning to Compositionally Generalize [34.656819307701156]
We implement a meta-learning augmented version of supervised learning. We construct pairs of tasks for meta-learning by sub-sampling existing training data. Experimental results on the COGS and SCAN datasets show that our similarity-driven meta-learning can improve generalization performance.
arXiv Detail & Related papers (2021-06-08T11:21:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.