Related papers: Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion

Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion

URL: http://arxiv.org/abs/2401.12947v1
Date: Tue, 23 Jan 2024 18:07:38 GMT
Title: Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion
Authors: Dylan Zhang, Curt Tigges, Zory Zhang, Stella Biderman, Maxim Raginsky, Talia Ringer
Abstract summary: We introduce a general framework that nicely connects the abstract concepts of structural recursion in the programming language domain to sequence modeling problems and learned models' behavior. With our framework as a powerful conceptual tool, we identify different issues under various set-ups.
Score: 14.739369424331478
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper investigates the ability of transformer-based models to learn structural recursion from examples. Recursion is a universal concept in both natural and formal languages. Structural recursion is central to the programming language and formal mathematics tasks where symbolic tools currently excel beyond neural models, such as inferring semantic relations between datatypes and emulating program behavior. We introduce a general framework that nicely connects the abstract concepts of structural recursion in the programming language domain to concrete sequence modeling problems and learned models' behavior. The framework includes a representation that captures the general \textit{syntax} of structural recursion, coupled with two different frameworks for understanding their \textit{semantics} -- one that is more natural from a programming languages perspective and one that helps bridge that perspective with a mechanistic understanding of the underlying transformer architecture. With our framework as a powerful conceptual tool, we identify different issues under various set-ups. The models trained to emulate recursive computations cannot fully capture the recursion yet instead fit short-cut algorithms and thus cannot solve certain edge cases that are under-represented in the training distribution. In addition, it is difficult for state-of-the-art large language models (LLMs) to mine recursive rules from in-context demonstrations. Meanwhile, these LLMs fail in interesting ways when emulating reduction (step-wise computation) of the recursive function.

Related papers

A New Graph Grammar Formalism for Robust Syntactic Pattern Recognition [0.0]
It does not use production rules, like a conventional graph grammar, but represents the syntactic structure in a more direct and declarative way. The grammar and the pattern are both represented as networks, and parsing is seen as the construction of a homomorphism from the pattern to the grammar.
arXiv Detail & Related papers (2025-04-22T15:23:37Z)
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior? [75.79305790453654]
Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP. We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance.
arXiv Detail & Related papers (2023-07-31T22:58:41Z)
A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z)
Can Transformers Learn to Solve Problems Recursively? [9.5623664764386]
This paper examines the behavior of neural networks learning algorithms relevant to programs and formal verification. By reconstructing these algorithms, we are able to correctly predict 91 percent of failure cases for one of the approximated functions.
arXiv Detail & Related papers (2023-05-24T04:08:37Z)
Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge. We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences. We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z)
Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs. Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z)
Folding over Neural Networks [1.7818230914983044]
This paper shows how structured recursion can be used to represent neural networks in Haskell. In turn, we promote a coherent implementation of neural networks that delineates between their structure and semantics.
arXiv Detail & Related papers (2022-07-03T18:20:05Z)
Recursive Reinforcement Learning [4.429642479975602]
Recursion is the fundamental paradigm to finitely describe potentially infinite objects. We develop RL algorithms capable of computing optimal policies in environments described as a collection of Markov decision processes.
arXiv Detail & Related papers (2022-06-23T00:29:42Z)
Learning Algebraic Recombination for Compositional Generalization [71.78771157219428]
We propose LeAR, an end-to-end neural model to learn algebraic recombination for compositional generalization. Key insight is to model the semantic parsing task as a homomorphism between a latent syntactic algebra and a semantic algebra. Experiments on two realistic and comprehensive compositional generalization demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2021-07-14T07:23:46Z)
R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling [36.61173494449218]
This paper proposes a model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps.
arXiv Detail & Related papers (2021-07-02T11:00:46Z)
Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models. We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.