Related papers: On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

URL: http://arxiv.org/abs/2406.14197v1
Date: Thu, 20 Jun 2024 10:59:02 GMT
Title: On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Authors: Franz Nowak, Anej Svete, Alexandra Butoi, Ryan Cotterell,
Abstract summary: Chain-of-thought (CoT) reasoning has improved the performance of modern language models (LMs) We show that LMs can represent the same family of distributions over strings as probabilistic Turing machines.
Score: 87.73401758641089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is that CoT reasoning extends an LM's computational power, as RNNs and transformers with additional scratch space are known to be Turing complete. Comparing LMs to Turing machines, however, introduces a category error - Turing machines decide language membership, whereas LMs define distributions over strings. To bridge this gap, we formalize CoT reasoning in a probabilistic setting. We present several results on the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines.

Related papers

Moving Beyond Next-Token Prediction: Transformers are Context-Sensitive Language Generators [0.40792653193642503]
Large Language Models (LLMs) powered by Transformers have demonstrated human-like intelligence capabilities. This paper presents a novel framework for interpreting LLMs as probabilistic left context-sensitive languages (CSLs) generators.
arXiv Detail & Related papers (2025-04-15T04:06:27Z)
Large Language Models and the Extended Church-Turing Thesis [0.0]
We investigate the computational power of large language models (LLMs) by the classical means of computability and computational complexity theory. We show that any fixed (non-adaptive) LLM is computationally equivalent to a, possibly very large, deterministic finite-state transducer. We discuss the merits of our findings in the broader context of several related disciplines and philosophies.
arXiv Detail & Related papers (2024-09-11T03:09:55Z)
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings. We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs. We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z)
Transformers Can Represent $n$-gram Language Models [56.06361029539347]
We focus on the relationship between transformer LMs and $n$-gram LMs, a simple and historically relevant class of language models. We show that transformer LMs using the hard or sparse attention mechanisms can exactly represent any $n$-gram LM.
arXiv Detail & Related papers (2024-04-23T12:51:37Z)
On the Representational Capacity of Recurrent Neural Language Models [56.19166912044362]
We show that a rationally weighted RLM with computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions. We also provide a lower bound by showing that under the restriction to real-time computation, such models can simulate deterministic real-time rational PTMs.
arXiv Detail & Related papers (2023-10-19T17:39:47Z)
Recurrent Neural Language Models as Probabilistic Finite-state Automata [66.23172872811594]
We study what classes of probability distributions RNN LMs can represent. We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata. These results present a first step towards characterizing the classes of distributions RNN LMs can represent.
arXiv Detail & Related papers (2023-10-08T13:36:05Z)
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z)
On the Linguistic Capacity of Real-Time Counter Automata [1.8072051868187933]
We study the abilities of real-time counter machines as formal grammars. We show that counter languages are closed under complement, union, intersection, and many other common set operations. This work makes general contributions to the theory of formal languages that are of potential interest for understanding recurrent neural networks.
arXiv Detail & Related papers (2020-04-15T03:37:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.