Iteration Head: A Mechanistic Study of Chain-of-Thought
- URL: http://arxiv.org/abs/2406.02128v2
- Date: Mon, 28 Oct 2024 11:14:54 GMT
- Title: Iteration Head: A Mechanistic Study of Chain-of-Thought
- Authors: Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Alice Yang, Francois Charton, Julia Kempe,
- Abstract summary: Chain-of-Thought (CoT) reasoning is known to improve Large Language Models.
This paper shows how CoT reasoning emerges in transformers in a controlled and interpretable setting.
- Score: 6.072247578478243
- License:
- Abstract: Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particular, we observe the appearance of a specialized attention mechanism dedicated to iterative reasoning, which we coined "iteration heads". We track both the emergence and the precise working of these iteration heads down to the attention level, and measure the transferability of the CoT skills to which they give rise between tasks.
Related papers
- Interpreting Affine Recurrence Learning in GPT-style Transformers [54.01174470722201]
In-context learning allows GPT-style transformers to generalize during inference without modifying their weights.
This paper focuses specifically on their ability to learn and predict affine recurrences as an ICL task.
We analyze the model's internal operations using both empirical and theoretical approaches.
arXiv Detail & Related papers (2024-10-22T21:30:01Z) - A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration [41.88275731297211]
We show that, compared to Stepwise ICL, the transformer gains better error correction ability and more accurate predictions if the reasoning from earlier steps is integrated.
We propose an improvement on CoT by incorporating both correct and incorrect reasoning paths in the demonstration.
arXiv Detail & Related papers (2024-10-21T22:07:20Z) - Understanding Reasoning in Chain-of-Thought from the Hopfieldian View [17.18897746431302]
We introduce a novel perspective grounded in the Hopfieldian view of cognition in cognitive neuroscience.
We establish a connection between Chain-of-Thought (CoT) reasoning and key cognitive elements such as stimuli, actions, neural populations, and representation spaces.
We propose the Representation-of-Thought (RoT) framework, which leverages the robustness of low-dimensional representation spaces to enhance the robustness of the reasoning process in CoTs.
arXiv Detail & Related papers (2024-10-04T16:55:30Z) - Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis [82.51626700527837]
Chain-of-shift (CoT) is an efficient method that enables the reasoning ability of large language models by augmenting the query using examples with multiple intermediate steps.
We show that despite the theoretical success of CoT, it fails to provide an accurate generalization when CoT does.
arXiv Detail & Related papers (2024-10-03T03:12:51Z) - A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning [48.51969964676017]
Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models.
We propose a Read-and-Control approach for controlling the accuracy of CoT.
arXiv Detail & Related papers (2024-06-18T04:07:13Z) - Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task.
LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced.
Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z) - Towards Understanding Chain-of-Thought Prompting: An Empirical Study of
What Matters [82.84696222087396]
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs)
We show that CoT reasoning is possible even with invalid demonstrations.
arXiv Detail & Related papers (2022-12-20T05:20:54Z) - Guiding Visual Question Answering with Attention Priors [76.21671164766073]
We propose to guide the attention mechanism using explicit linguistic-visual grounding.
This grounding is derived by connecting structured linguistic concepts in the query to their referents among the visual objects.
The resultant algorithm is capable of probing attention-based reasoning models, injecting relevant associative knowledge, and regulating the core reasoning process.
arXiv Detail & Related papers (2022-05-25T09:53:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.