Can Transformers Learn to Solve Problems Recursively?
- URL: http://arxiv.org/abs/2305.14699v2
- Date: Sun, 25 Jun 2023 18:38:38 GMT
- Title: Can Transformers Learn to Solve Problems Recursively?
- Authors: Shizhuo Dylan Zhang, Curt Tigges, Stella Biderman, Maxim Raginsky,
Talia Ringer
- Abstract summary: This paper examines the behavior of neural networks learning algorithms relevant to programs and formal verification.
By reconstructing these algorithms, we are able to correctly predict 91 percent of failure cases for one of the approximated functions.
- Score: 9.5623664764386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks have in recent years shown promise for helping software
engineers write programs and even formally verify them. While semantic
information plays a crucial part in these processes, it remains unclear to what
degree popular neural architectures like transformers are capable of modeling
that information. This paper examines the behavior of neural networks learning
algorithms relevant to programs and formal verification proofs through the lens
of mechanistic interpretability, focusing in particular on structural
recursion. Structural recursion is at the heart of tasks on which symbolic
tools currently outperform neural models, like inferring semantic relations
between datatypes and emulating program behavior. We evaluate the ability of
transformer models to learn to emulate the behavior of structurally recursive
functions from input-output examples. Our evaluation includes empirical and
conceptual analyses of the limitations and capabilities of transformer models
in approximating these functions, as well as reconstructions of the ``shortcut"
algorithms the model learns. By reconstructing these algorithms, we are able to
correctly predict 91 percent of failure cases for one of the approximated
functions. Our work provides a new foundation for understanding the behavior of
neural networks that fail to solve the very tasks they are trained for.
Related papers
- Unsupervised Learning of Invariance Transformations [105.54048699217668]
We develop an algorithmic framework for finding approximate graph automorphisms.
We discuss how this framework can be used to find approximate automorphisms in weighted graphs in general.
arXiv Detail & Related papers (2023-07-24T17:03:28Z) - The Clock and the Pizza: Two Stories in Mechanistic Explanation of
Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex.
We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z) - Break It Down: Evidence for Structural Compositionality in Neural
Networks [32.382094867951224]
We show that neural networks can learn compositionality, obviating the need for specialized symbolic mechanisms.
This suggests that neural networks may be able to learn compositionality, obviating the need for specialized symbolic mechanisms.
arXiv Detail & Related papers (2023-01-26T00:53:11Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Gaussian Process Surrogate Models for Neural Networks [6.8304779077042515]
In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque.
We construct a class of surrogate models for neural networks using Gaussian processes.
We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems.
arXiv Detail & Related papers (2022-08-11T20:17:02Z) - Unveiling Transformers with LEGO: a synthetic reasoning task [23.535488809197787]
We study how the transformer architecture learns to follow a chain of reasoning.
In some data regime the trained transformer finds "shortcut" solutions to follow the chain of reasoning.
We find that one can prevent such shortcut with appropriate architecture modification or careful data preparation.
arXiv Detail & Related papers (2022-06-09T06:30:17Z) - Can deep neural networks learn process model structure? An assessment
framework and analysis [0.2580765958706854]
We propose an evaluation scheme complemented with new fitness, precision, and generalisation metrics.
We apply this framework to several process models with simple control-flow behaviour.
Our results show that, even for such simplistic models, careful tuning of overfitting countermeasures is required.
arXiv Detail & Related papers (2022-02-24T09:44:13Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Spiking Neural Networks Hardware Implementations and Challenges: a
Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles.
We present the state of the art of hardware implementations of spiking neural networks.
We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z) - Teaching Recurrent Neural Networks to Modify Chaotic Memories by Example [14.91507266777207]
We show that a recurrent neural network can learn to modify its representation of complex information using only examples.
We provide a mechanism for how these computations are learned, and demonstrate that a single network can simultaneously learn multiple computations.
arXiv Detail & Related papers (2020-05-03T20:51:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.