How Do Transformers Learn Variable Binding in Symbolic Programs?
- URL: http://arxiv.org/abs/2505.20896v2
- Date: Fri, 30 May 2025 18:08:50 GMT
- Title: How Do Transformers Learn Variable Binding in Symbolic Programs?
- Authors: Yiwei Wu, Atticus Geiger, Raphaël Millière,
- Abstract summary: We train a Transformer to dereference queried variables in symbolic programs.<n>We find that the model learns to exploit the residual stream as an addressable memory space.<n>Our results show how Transformer models can learn to implement systematic variable binding.
- Score: 5.611678524375841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variable binding -- the ability to associate variables with values -- is fundamental to symbolic computation and cognition. Although classical architectures typically implement variable binding via addressable memory, it is not well understood how modern neural networks lacking built-in binding operations may acquire this capacity. We investigate this by training a Transformer to dereference queried variables in symbolic programs where variables are assigned either numerical constants or other variables. Each program requires following chains of variable assignments up to four steps deep to find the queried value, and also contains irrelevant chains of assignments acting as distractors. Our analysis reveals a developmental trajectory with three distinct phases during training: (1) random prediction of numerical constants, (2) a shallow heuristic prioritizing early variable assignments, and (3) the emergence of a systematic mechanism for dereferencing assignment chains. Using causal interventions, we find that the model learns to exploit the residual stream as an addressable memory space, with specialized attention heads routing information across token positions. This mechanism allows the model to dynamically track variable bindings across layers, resulting in accurate dereferencing. Our results show how Transformer models can learn to implement systematic variable binding without explicit architectural support, bridging connectionist and symbolic approaches. To facilitate reproducible research, we developed Variable Scope, an interactive web platform for exploring our findings at https://variablescope.org
Related papers
- Alternatives of Unsupervised Representations of Variables on the Latent Space [0.0]
The article addresses the application of unsupervised machine learning to represent variables on the 2D latent space by applying a variational autoencoder (beta-VAE)
Five distinct methods have been introduced to represent variables on the latent space.
Twenty-eight approaches of variable representations by beta-VAE have been considered.
arXiv Detail & Related papers (2024-10-26T13:06:35Z) - Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components.
Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z) - Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized.
We find that these random transformers can perform a wide range of meaningful algorithmic tasks.
Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z) - A Pattern Language for Machine Learning Tasks [0.0]
We formalise the essential data of objective functions as equality constraints on composites of learners.<n>We develop a flowchart-like graphical mathematics for tasks that allows us to; (1) offer a unified perspective of approaches in machine learning across domains; (2) design and optimise desired behaviours model-agnostically; and (3) import insights from theoretical computer science into practical machine learning.
arXiv Detail & Related papers (2024-07-02T16:50:27Z) - Scalable variable selection for two-view learning tasks with projection
operators [0.0]
We propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems.
Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions.
arXiv Detail & Related papers (2023-07-04T08:22:05Z) - BISCUIT: Causal Representation Learning from Binary Interactions [36.358968799947924]
BISCUIT is a method for simultaneously learning causal variables and their corresponding binary interaction variables.
On three robotic-inspired datasets, BISCUIT accurately identifies causal variables and can even be scaled to complex, realistic environments for embodied AI.
arXiv Detail & Related papers (2023-06-16T06:10:55Z) - Do Transformers use variable binding? [14.222494511474103]
Increasing the explainability of deep neural networks (DNNs) requires evaluating whether they implement symbolic computation.
One central symbolic capacity is variable binding: linking an input value to an abstract variable held in system-internal memory.
We provide the first systematic evaluation of the variable binding capacities of the state-of-the-art Transformer networks BERT and RoBERTa.
arXiv Detail & Related papers (2022-02-19T09:56:38Z) - VarCLR: Variable Semantic Representation Pre-training via Contrastive
Learning [84.70916463298109]
VarCLR is a new approach for learning semantic representations of variable names.
VarCLR is an excellent fit for contrastive learning, which aims to minimize the distance between explicitly similar inputs.
We show that VarCLR enables the effective application of sophisticated, general-purpose language models like BERT.
arXiv Detail & Related papers (2021-12-05T18:40:32Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training.
We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively.
Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z) - Training or Architecture? How to Incorporate Invariance in Neural
Networks [14.162739081163444]
We propose a method for provably invariant network architectures with respect to group actions.
In a nutshell, we intend to 'undo' any possible transformation before feeding the data into the actual network.
We analyze properties of such approaches, extend them to equivariant networks, and demonstrate their advantages in terms of robustness as well as computational efficiency in several numerical examples.
arXiv Detail & Related papers (2021-06-18T10:31:00Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Visual Neural Decomposition to Explain Multivariate Data Sets [13.117139248511783]
Investigating relationships between variables in multi-dimensional data sets is a common task for data analysts and engineers.
We propose a novel approach to visualize correlations between input variables and a target output variable that scales to hundreds of variables.
arXiv Detail & Related papers (2020-09-11T15:53:37Z) - RE-MIMO: Recurrent and Permutation Equivariant Neural MIMO Detection [85.44877328116881]
We present a novel neural network for symbol detection in wireless communication systems.
It is motivated by several important considerations in wireless communication systems.
We compare its performance against existing methods and the results show the ability of our network to efficiently handle a variable number of transmitters.
arXiv Detail & Related papers (2020-06-30T22:43:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.