Universal Length Generalization with Turing Programs
- URL: http://arxiv.org/abs/2407.03310v1
- Date: Wed, 3 Jul 2024 17:53:44 GMT
- Title: Universal Length Generalization with Turing Programs
- Authors: Kaiying Hou, David Brandfonbrener, Sham Kakade, Samy Jelassi, Eran Malach,
- Abstract summary: We propose Turing Programs, a strategy that decomposes an algorithmic task into steps mimicking a Turing Machine.
By using Turing Programs, we obtain robust length generalization on a range of algorithmic tasks.
We then demonstrate that transformers achieve length generalization on random Turing Programs, suggesting that length generalization is possible for any algorithmic task.
- Score: 24.077722969687898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models. While prior work has proposed some architecture or data format changes to achieve length generalization, these proposals typically apply to a limited set of tasks. Building on prior scratchpad and Chain-of-Thought (CoT) techniques, we propose Turing Programs, a novel CoT strategy that decomposes an algorithmic task into steps mimicking the computation of a Turing Machine. This framework is both universal, as it can accommodate any algorithmic task, and simple, requiring only copying text from the context with small modifications. We show that by using Turing Programs, we obtain robust length generalization on a range of algorithmic tasks: addition, multiplication and in-context SGD. We then demonstrate that transformers achieve length generalization on random Turing Programs, suggesting that length generalization is possible for any algorithmic task. Finally, we theoretically prove that transformers can implement Turing Programs, constructing a simple RASP (Weiss et al.) program that simulates an arbitrary Turing machine.
Related papers
- What Algorithms can Transformers Learn? A Study in Length Generalization [23.970598914609916]
We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks.
Specifically, we leverage RASP -- a programming language designed for the computational model of a Transformer.
Our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.
arXiv Detail & Related papers (2023-10-24T17:43:29Z) - Compositional Program Generation for Few-Shot Systematic Generalization [59.57656559816271]
This study on a neuro-symbolic architecture called the Compositional Program Generator (CPG)
CPG has three key features: textitmodularity, textitcomposition, and textitabstraction, in the form of grammar rules.
It perfect achieves generalization on both the SCAN and COGS benchmarks using just 14 examples for SCAN and 22 examples for COGS.
arXiv Detail & Related papers (2023-09-28T14:33:20Z) - ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis [54.18659323181771]
We characterize several different forms of compositional generalization that are desirable in program synthesis.
We propose ExeDec, a novel decomposition-based strategy that predicts execution subgoals to solve problems step-by-step informed by program execution at each step.
arXiv Detail & Related papers (2023-07-26T01:07:52Z) - Planning with Large Language Models for Code Generation [100.07232672883897]
Planning-Guided Transformer Decoding (PG-TD) uses a planning algorithm to do lookahead search and guide the Transformer to generate better programs.
We empirically evaluate our framework with several large language models as backbones on public coding challenge benchmarks.
arXiv Detail & Related papers (2023-03-09T18:59:47Z) - A Divide-Align-Conquer Strategy for Program Synthesis [8.595181704811889]
We show that compositional segmentation can be applied in the programming by examples setting to divide the search for large programs across multiple smaller program synthesis problems.
A structural alignment of the constituent parts in the input and output leads to pairwise correspondences used to guide the program search.
arXiv Detail & Related papers (2023-01-08T19:10:55Z) - Compositional Generalization and Decomposition in Neural Program
Synthesis [59.356261137313275]
In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize.
We first characterize several different axes along which program synthesis methods would be desired to generalize.
We introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets.
arXiv Detail & Related papers (2022-04-07T22:16:05Z) - Searching for More Efficient Dynamic Programs [61.79535031840558]
We describe a set of program transformations, a simple metric for assessing the efficiency of a transformed program, and a search procedure to improve this metric.
We show that in practice, automated search can find substantial improvements to the initial program.
arXiv Detail & Related papers (2021-09-14T20:52:55Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - Quantum circuit design for universal distribution using a superposition
of classical automata [2.4192504570921622]
This circuit is able to accelerate the inference of algorithmic structures in data for discovering causal generative models.
A classical exhaustive enumeration of all possible programs on the automata is shown for a couple of example cases.
This is the first time, a superposition of classical automata is implemented on the circuit model of quantum computation.
arXiv Detail & Related papers (2020-06-01T14:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.