Can Transformers Learn to Solve Problems Recursively?
- URL: http://arxiv.org/abs/2305.14699v2
- Date: Sun, 25 Jun 2023 18:38:38 GMT
- Title: Can Transformers Learn to Solve Problems Recursively?
- Authors: Shizhuo Dylan Zhang, Curt Tigges, Stella Biderman, Maxim Raginsky,
Talia Ringer
- Abstract summary: This paper examines the behavior of neural networks learning algorithms relevant to programs and formal verification.
By reconstructing these algorithms, we are able to correctly predict 91 percent of failure cases for one of the approximated functions.
- Score: 9.5623664764386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks have in recent years shown promise for helping software
engineers write programs and even formally verify them. While semantic
information plays a crucial part in these processes, it remains unclear to what
degree popular neural architectures like transformers are capable of modeling
that information. This paper examines the behavior of neural networks learning
algorithms relevant to programs and formal verification proofs through the lens
of mechanistic interpretability, focusing in particular on structural
recursion. Structural recursion is at the heart of tasks on which symbolic
tools currently outperform neural models, like inferring semantic relations
between datatypes and emulating program behavior. We evaluate the ability of
transformer models to learn to emulate the behavior of structurally recursive
functions from input-output examples. Our evaluation includes empirical and
conceptual analyses of the limitations and capabilities of transformer models
in approximating these functions, as well as reconstructions of the ``shortcut"
algorithms the model learns. By reconstructing these algorithms, we are able to
correctly predict 91 percent of failure cases for one of the approximated
functions. Our work provides a new foundation for understanding the behavior of
neural networks that fail to solve the very tasks they are trained for.
Related papers
- Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized.
We find that these random transformers can perform a wide range of meaningful algorithmic tasks.
Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z) - Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer [29.970200877158764]
We investigate the influence of recurrent structures in neural models on their reasoning abilities and computability.
We shed light on how the CoT approach can mimic recurrent computation and act as a bridge between autoregression and recurrence.
arXiv Detail & Related papers (2024-09-14T00:30:57Z) - Emergence in non-neural models: grokking modular arithmetic via average gradient outer product [16.911836722312152]
We show that grokking is not specific to neural networks nor to gradient descent-based optimization.
We show that this phenomenon occurs when learning modular arithmetic with Recursive Feature Machines.
Our results demonstrate that emergence can result purely from learning task-relevant features.
arXiv Detail & Related papers (2024-07-29T17:28:58Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Unsupervised Learning of Invariance Transformations [105.54048699217668]
We develop an algorithmic framework for finding approximate graph automorphisms.
We discuss how this framework can be used to find approximate automorphisms in weighted graphs in general.
arXiv Detail & Related papers (2023-07-24T17:03:28Z) - The Clock and the Pizza: Two Stories in Mechanistic Explanation of
Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex.
We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z) - Break It Down: Evidence for Structural Compositionality in Neural
Networks [32.382094867951224]
We show that neural networks can learn compositionality, obviating the need for specialized symbolic mechanisms.
This suggests that neural networks may be able to learn compositionality, obviating the need for specialized symbolic mechanisms.
arXiv Detail & Related papers (2023-01-26T00:53:11Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Gaussian Process Surrogate Models for Neural Networks [6.8304779077042515]
In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque.
We construct a class of surrogate models for neural networks using Gaussian processes.
We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems.
arXiv Detail & Related papers (2022-08-11T20:17:02Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.