Related papers: Understanding Hidden Computations in Chain-of-Thought Reasoning

Related papers

Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer [0.0]
Chain-of-thought (CoT) reasoning has enabled transformer-based language models to excel at complex mathematics and multi-step planning.<n>In standard decoder-only architectures, these reasoning steps are externalized in natural language, improving interpretability at the cost of efficiency.<n>We investigate whether such reasoning structures emerge in Huginn-3.5B, a depth-recurrent Transformer that reuses layers at inference time without increasing parameter count.
arXiv Detail & Related papers (2025-07-02T23:35:21Z)
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [53.18562650350898]
Chain-of-thought (CoT) reasoning enhances performance of large language models.<n>We present the first comprehensive study of CoT faithfulness in large vision-language models.
arXiv Detail & Related papers (2025-05-29T18:55:05Z)
How does Transformer Learn Implicit Reasoning? [41.315116538534106]
We study how implicit multi-hop reasoning emerges by training transformers from scratch in a controlled symbolic environment.<n>We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures.
arXiv Detail & Related papers (2025-05-29T17:02:49Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps [3.8936716676293917]
This study investigates the in-context learning capabilities of various decoder-only transformer-based language models with different model sizes and training data. We identify a critical parameter threshold (1.6 billion), beyond which reasoning performance improves significantly in tasks such as commonsense reasoning in multiple-choice question answering and deductive reasoning.
arXiv Detail & Related papers (2025-02-21T00:48:32Z)
Transformers Use Causal World Models in Maze-Solving Tasks [49.67445252528868]
We identify World Models in transformers trained on maze-solving tasks. We find that it is easier to activate features than to suppress them. positional encoding schemes appear to influence how World Models are structured within the model's residual stream.
arXiv Detail & Related papers (2024-12-16T15:21:04Z)
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models [32.2976613483151]
We analyze a mechanism used in two LMs to selectively inhibit items in a context in one task. We find that models write into low-rank subspaces of the residual stream to represent features which are then read out by later layers.
arXiv Detail & Related papers (2024-06-13T18:12:01Z)
Calibrating Reasoning in Language Models with Internal Consistency [18.24350001344488]
Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks.<n>LLMs often generate text with obvious mistakes and contradictions.<n>In this work, we investigate reasoning in LLMs through the lens of internal representations.
arXiv Detail & Related papers (2024-05-29T02:44:12Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales [102.54274021830207]
We introduce Fact, a novel paradigm designed to generate multimodal rationales that are faithful, concise, and transferable for teaching MLLMs. We filter rationales that can be transferred to end-to-end paradigms from programming paradigms to guarantee transferability. Our approach also reduces hallucinations owing to its high correlation between images and text.
arXiv Detail & Related papers (2024-04-17T07:20:56Z)
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models [24.817659341654654]
We introduce a framework called Patchscopes and show how it can be used to answer a wide range of questions about an LLM's computation. We show that many prior interpretability methods based on projecting representations into the vocabulary space and intervening on the LLM can be viewed as instances of this framework. Beyond unifying prior inspection techniques, Patchscopes also opens up new possibilities such as using a more capable model to explain the representations of a smaller model, and multihop reasoning error correction.
arXiv Detail & Related papers (2024-01-11T18:33:48Z)
Structured World Representations in Maze-Solving Transformers [3.75591091941815]
This work focuses on the abstractions formed by small transformer models. We find evidence for the consistent emergence of structured internal representations of maze topology and valid paths. We also take steps towards deciphering the circuity of path-following by identifying attention heads.
arXiv Detail & Related papers (2023-12-05T08:24:26Z)
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars [36.780346257061495]
Interpretability methods aim to understand the algorithm implemented by a trained model. We take a critical view of methods that exclusively focus on individual parts of the model.
arXiv Detail & Related papers (2023-12-03T15:34:46Z)
Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings [61.04460792203266]
We introduce VCoT, a novel method that leverages chain-of-thought prompting with vision-language grounding to bridge the logical gaps within sequential data. Our method uses visual guidance to generate synthetic multimodal infillings that add consistent and novel information to reduce the logical gaps for downstream tasks.
arXiv Detail & Related papers (2023-05-03T17:58:29Z)
What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z)
VisBERT: Hidden-State Visualizations for Transformers [66.86452388524886]
We present VisBERT, a tool for visualizing the contextual token representations within BERT for the task of (multi-hop) Question Answering. VisBERT enables users to get insights about the model's internal state and to explore its inference steps or potential shortcomings.
arXiv Detail & Related papers (2020-11-09T15:37:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.