Related papers: Can Transformers Learn to Solve Problems Recursively?

Can Transformers Learn to Solve Problems Recursively?

URL: http://arxiv.org/abs/2305.14699v2
Date: Sun, 25 Jun 2023 18:38:38 GMT
Title: Can Transformers Learn to Solve Problems Recursively?
Authors: Shizhuo Dylan Zhang, Curt Tigges, Stella Biderman, Maxim Raginsky, Talia Ringer
Abstract summary: This paper examines the behavior of neural networks learning algorithms relevant to programs and formal verification. By reconstructing these algorithms, we are able to correctly predict 91 percent of failure cases for one of the approximated functions.
Score: 9.5623664764386
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural networks have in recent years shown promise for helping software engineers write programs and even formally verify them. While semantic information plays a crucial part in these processes, it remains unclear to what degree popular neural architectures like transformers are capable of modeling that information. This paper examines the behavior of neural networks learning algorithms relevant to programs and formal verification proofs through the lens of mechanistic interpretability, focusing in particular on structural recursion. Structural recursion is at the heart of tasks on which symbolic tools currently outperform neural models, like inferring semantic relations between datatypes and emulating program behavior. We evaluate the ability of transformer models to learn to emulate the behavior of structurally recursive functions from input-output examples. Our evaluation includes empirical and conceptual analyses of the limitations and capabilities of transformer models in approximating these functions, as well as reconstructions of the ``shortcut" algorithms the model learns. By reconstructing these algorithms, we are able to correctly predict 91 percent of failure cases for one of the approximated functions. Our work provides a new foundation for understanding the behavior of neural networks that fail to solve the very tasks they are trained for.

Related papers

Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized. We find that these random transformers can perform a wide range of meaningful algorithmic tasks. Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z)
Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer [29.970200877158764]
We investigate the influence of recurrent structures in neural models on their reasoning abilities and computability. We shed light on how the CoT approach can mimic recurrent computation and act as a bridge between autoregression and recurrence.
arXiv Detail & Related papers (2024-09-14T00:30:57Z)
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product [16.911836722312152]
We show that grokking is not specific to neural networks nor to gradient descent-based optimization. We show that this phenomenon occurs when learning modular arithmetic with Recursive Feature Machines. Our results demonstrate that emergence can result purely from learning task-relevant features.
arXiv Detail & Related papers (2024-07-29T17:28:58Z)
Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so. We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed. Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z)
Unsupervised Learning of Invariance Transformations [105.54048699217668]
We develop an algorithmic framework for finding approximate graph automorphisms. We discuss how this framework can be used to find approximate automorphisms in weighted graphs in general.
arXiv Detail & Related papers (2023-07-24T17:03:28Z)
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex. We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z)
Break It Down: Evidence for Structural Compositionality in Neural Networks [32.382094867951224]
We show that neural networks can learn compositionality, obviating the need for specialized symbolic mechanisms. This suggests that neural networks may be able to learn compositionality, obviating the need for specialized symbolic mechanisms.
arXiv Detail & Related papers (2023-01-26T00:53:11Z)
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z)
Gaussian Process Surrogate Models for Neural Networks [6.8304779077042515]
In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. We construct a class of surrogate models for neural networks using Gaussian processes. We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems.
arXiv Detail & Related papers (2022-08-11T20:17:02Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.