Systematic Generalization and Emergent Structures in Transformers
Trained on Structured Tasks
- URL: http://arxiv.org/abs/2210.00400v1
- Date: Sun, 2 Oct 2022 00:46:36 GMT
- Title: Systematic Generalization and Emergent Structures in Transformers
Trained on Structured Tasks
- Authors: Yuxuan Li and James L. McClelland
- Abstract summary: We show how a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions.
We show that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition.
These results provide key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi-level policies.
- Score: 6.525090891505941
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer networks have seen great success in natural language processing
and machine vision, where task objectives such as next word prediction and
image classification benefit from nuanced context sensitivity across
high-dimensional inputs. However, there is an ongoing debate about how and when
transformers can acquire highly structured behavior and achieve systematic
generalization. Here, we explore how well a causal transformer can perform a
set of algorithmic tasks, including copying, sorting, and hierarchical
compositions of these operations. We demonstrate strong generalization to
sequences longer than those used in training by replacing the standard
positional encoding typically used in transformers with labels arbitrarily
paired with items in the sequence. By finding the layer and head configuration
sufficient to solve the task, then performing ablation experiments and
representation analysis, we show that two-layer transformers learn
generalizable solutions to multi-level problems and develop signs of systematic
task decomposition. They also exploit shared computation across related tasks.
These results provide key insights into how transformer models may be capable
of decomposing complex decisions into reusable, multi-level policies in tasks
requiring structured behavior.
Related papers
- In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference.
This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z) - Attention as a Hypernetwork [22.087242869138223]
Transformers can generalize to novel problem instances whose constituent parts might have been encountered during training but whose compositions have not.
By reformulating multi-head attention as a hypernetwork, we reveal that a composable, low-dimensional latent code specifies key-Query specific operations.
We find that this latent code is predictive of the subtasks the network performs on unseen task compositions.
arXiv Detail & Related papers (2024-06-09T15:08:00Z) - Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures.
We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z) - Compositional Capabilities of Autoregressive Transformers: A Study on
Synthetic, Interpretable Tasks [23.516986266146855]
We train autoregressive Transformer models on a synthetic data-generating process.
We show that autoregressive Transformers can learn compositional structures from small amounts of training data.
arXiv Detail & Related papers (2023-11-21T21:16:54Z) - What Algorithms can Transformers Learn? A Study in Length Generalization [23.970598914609916]
We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks.
Specifically, we leverage RASP -- a programming language designed for the computational model of a Transformer.
Our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.
arXiv Detail & Related papers (2023-10-24T17:43:29Z) - Adaptivity and Modularity for Efficient Generalization Over Task
Complexity [42.748898521364914]
We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential steps.
We propose a transformer-based architecture called Hyper-UT, which combines dynamic function generation from hyper networks with adaptive depth from Universal Transformers.
arXiv Detail & Related papers (2023-10-13T05:29:09Z) - When Can Transformers Ground and Compose: Insights from Compositional
Generalization Benchmarks [7.4726048754587415]
Humans can reason compositionally whilst grounding language utterances to the real world.
Recent benchmarks like ReaSCAN use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabilities.
We present a simple transformer-based model that outperforms specialized architectures on ReaSCAN and a modified version of gSCAN.
arXiv Detail & Related papers (2022-10-23T17:03:55Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.