When Can Transformers Ground and Compose: Insights from Compositional
Generalization Benchmarks
- URL: http://arxiv.org/abs/2210.12786v1
- Date: Sun, 23 Oct 2022 17:03:55 GMT
- Title: When Can Transformers Ground and Compose: Insights from Compositional
Generalization Benchmarks
- Authors: Ankur Sikarwar, Arkil Patel, Navin Goyal
- Abstract summary: Humans can reason compositionally whilst grounding language utterances to the real world.
Recent benchmarks like ReaSCAN use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabilities.
We present a simple transformer-based model that outperforms specialized architectures on ReaSCAN and a modified version of gSCAN.
- Score: 7.4726048754587415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans can reason compositionally whilst grounding language utterances to the
real world. Recent benchmarks like ReaSCAN use navigation tasks grounded in a
grid world to assess whether neural models exhibit similar capabilities. In
this work, we present a simple transformer-based model that outperforms
specialized architectures on ReaSCAN and a modified version of gSCAN. On
analyzing the task, we find that identifying the target location in the grid
world is the main challenge for the models. Furthermore, we show that a
particular split in ReaSCAN, which tests depth generalization, is unfair. On an
amended version of this split, we show that transformers can generalize to
deeper input structures. Finally, we design a simpler grounded compositional
generalization task, RefEx, to investigate how transformers reason
compositionally. We show that a single self-attention layer with a single head
generalizes to novel combinations of object attributes. Moreover, we derive a
precise mathematical construction of the transformer's computations from the
learned network. Overall, we provide valuable insights about the grounded
compositional generalization task and the behaviour of transformers on it,
which would be useful for researchers working in this area.
Related papers
- Attention as a Hypernetwork [22.087242869138223]
Transformers can generalize to novel problem instances whose constituent parts might have been encountered during training but whose compositions have not.
By reformulating multi-head attention as a hypernetwork, we reveal that a composable, low-dimensional latent code specifies key-Query specific operations.
We find that this latent code is predictive of the subtasks the network performs on unseen task compositions.
arXiv Detail & Related papers (2024-06-09T15:08:00Z) - What Algorithms can Transformers Learn? A Study in Length Generalization [23.970598914609916]
We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks.
Specifically, we leverage RASP -- a programming language designed for the computational model of a Transformer.
Our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.
arXiv Detail & Related papers (2023-10-24T17:43:29Z) - SimPLR: A Simple and Plain Transformer for Scaling-Efficient Object Detection and Segmentation [49.65221743520028]
We show that a transformer-based detector with scale-aware attention enables the plain detector SimPLR' whose backbone and detection head are both non-hierarchical and operate on single-scale features.
Compared to the multi-scale and single-scale state-of-the-art, our model scales much better with bigger capacity (self-supervised) models and more pre-training data.
arXiv Detail & Related papers (2023-10-09T17:59:26Z) - Out-of-Distribution Generalization in Algorithmic Reasoning Through
Curriculum Learning [4.191829617421395]
Out-of-distribution generalization is a longstanding challenge for neural networks.
We show that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks.
arXiv Detail & Related papers (2022-10-07T01:21:05Z) - Systematic Generalization and Emergent Structures in Transformers
Trained on Structured Tasks [6.525090891505941]
We show how a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions.
We show that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition.
These results provide key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi-level policies.
arXiv Detail & Related papers (2022-10-02T00:46:36Z) - Compositional Generalization and Decomposition in Neural Program
Synthesis [59.356261137313275]
In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize.
We first characterize several different axes along which program synthesis methods would be desired to generalize.
We introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets.
arXiv Detail & Related papers (2022-04-07T22:16:05Z) - SeqTR: A Simple yet Universal Network for Visual Grounding [88.03253818868204]
We propose a simple yet universal network termed SeqTR for visual grounding tasks.
We cast visual grounding as a point prediction problem conditioned on image and text inputs.
Under this paradigm, visual grounding tasks are unified in our SeqTR network without task-specific branches or heads.
arXiv Detail & Related papers (2022-03-30T12:52:46Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - End-to-End Trainable Multi-Instance Pose Estimation with Transformers [68.93512627479197]
We propose a new end-to-end trainable approach for multi-instance pose estimation by combining a convolutional neural network with a transformer.
Inspired by recent work on end-to-end trainable object detection with transformers, we use a transformer encoder-decoder architecture together with a bipartite matching scheme to directly regress the pose of all individuals in a given image.
Our model, called POse Estimation Transformer (POET), is trained using a novel set-based global loss that consists of a keypoint loss, a keypoint visibility loss, a center loss and a class loss.
arXiv Detail & Related papers (2021-03-22T18:19:22Z) - Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results.
ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance.
We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.