On the Power of Decision Trees in Auto-Regressive Language Modeling
- URL: http://arxiv.org/abs/2409.19150v1
- Date: Fri, 27 Sep 2024 21:25:00 GMT
- Title: On the Power of Decision Trees in Auto-Regressive Language Modeling
- Authors: Yulu Gan, Tomer Galanti, Tomaso Poggio, Eran Malach,
- Abstract summary: Auto-regressive Decision Trees (ARDTs) have not yet been explored for language modeling.
This paper delves into both the theoretical and practical applications of ARDTs in this new context.
We show that ARDTs can compute complex functions, such as simulating automata, Turing machines, and sparse circuits.
- Score: 21.97150778373607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Originally proposed for handling time series data, Auto-regressive Decision Trees (ARDTs) have not yet been explored for language modeling. This paper delves into both the theoretical and practical applications of ARDTs in this new context. We theoretically demonstrate that ARDTs can compute complex functions, such as simulating automata, Turing machines, and sparse circuits, by leveraging "chain-of-thought" computations. Our analysis provides bounds on the size, depth, and computational efficiency of ARDTs, highlighting their surprising computational power. Empirically, we train ARDTs on simple language generation tasks, showing that they can learn to generate coherent and grammatically correct text on par with a smaller Transformer model. Additionally, we show that ARDTs can be used on top of transformer representations to solve complex reasoning tasks. This research reveals the unique computational abilities of ARDTs, aiming to broaden the architectural diversity in language model development.
Related papers
- CALT: A Library for Computer Algebra with Transformer [4.069144210024564]
We introduce Computer Algebra with Transformer (CALT), a user-friendly Python library designed to help non-experts in deep learning train models for symbolic computation tasks.
arXiv Detail & Related papers (2025-06-10T09:06:41Z) - Tracr-Injection: Distilling Algorithms into Pre-trained Language Models [2.532202013576547]
We propose tracr-injection, a method that allows us to distill algorithms written in RASP directly into a pre-trained language model.<n>We show how our method creates an interpretable subspace within the model's residual stream, which can be decoded into the variables present in the RASP algorithm.
arXiv Detail & Related papers (2025-05-15T21:43:51Z) - Enigme: Generative Text Puzzles for Evaluating Reasoning in Language Models [0.0]
Transformer-decoder language models are a core innovation in text based generative artificial intelligence.<n>We present enigme, an open-source library for generating text-based puzzles to be used in training and evaluating reasoning skills.
arXiv Detail & Related papers (2025-05-08T03:09:57Z) - Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Learning Linear Attention in Polynomial Time [115.68795790532289]
We provide the first results on learnability of single-layer Transformers with linear attention.
We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS.
We show how to efficiently identify training datasets for which every empirical riskr is equivalent to the linear Transformer.
arXiv Detail & Related papers (2024-10-14T02:41:01Z) - Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer [29.970200877158764]
We investigate the influence of recurrent structures in neural models on their reasoning abilities and computability.
We shed light on how the CoT approach can mimic recurrent computation and act as a bridge between autoregression and recurrence.
arXiv Detail & Related papers (2024-09-14T00:30:57Z) - Large Language Models and the Extended Church-Turing Thesis [0.0]
We investigate the computational power of large language models (LLMs) by the classical means of computability and computational complexity theory.
We show that any fixed (non-adaptive) LLM is computationally equivalent to a, possibly very large, deterministic finite-state transducer.
We discuss the merits of our findings in the broader context of several related disciplines and philosophies.
arXiv Detail & Related papers (2024-09-11T03:09:55Z) - On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning [87.73401758641089]
Chain-of-thought (CoT) reasoning has improved the performance of modern language models (LMs)
We show that LMs can represent the same family of distributions over strings as probabilistic Turing machines.
arXiv Detail & Related papers (2024-06-20T10:59:02Z) - Transformers meet Neural Algorithmic Reasoners [16.5785372289558]
We propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs)
We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning.
arXiv Detail & Related papers (2024-06-13T16:42:06Z) - TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative
Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas.
We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system.
We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z) - Auto-Regressive Next-Token Predictors are Universal Learners [17.416520406390415]
We show that even simple models such as linear next-token predictors can approximate any function efficiently computed by a Turing machine.
We also show experimentally that simple next-token predictors, such as linear networks and shallow Multi-Layer Perceptrons (MLPs), display non-trivial performance on text generation and arithmetic tasks.
arXiv Detail & Related papers (2023-09-13T14:15:03Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.