How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?
- URL: http://arxiv.org/abs/2602.22441v1
- Date: Wed, 25 Feb 2026 22:00:59 GMT
- Title: How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?
- Authors: Yingqian Cui, Zhenwei Dai, Bing He, Zhan Shi, Hui Liu, Rui Sun, Zhiji Liu, Yue Xing, Jiliang Tang, Benoit Dumoulin,
- Abstract summary: We conduct a comprehensive analysis of latent reasoning methods to better understand the role and behavior of latent representation in the process.<n>We find that while latent representations can encode multiple possibilities, the reasoning process does not faithfully implement structured search.<n>Our findings reveal a trade-off associated with supervision strength: stronger supervision mitigates shortcut behavior but restricts the ability of latent representations to maintain diverse hypotheses.
- Score: 45.11635323173876
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Latent reasoning has been recently proposed as a reasoning paradigm and performs multi-step reasoning through generating steps in the latent space instead of the textual space. This paradigm enables reasoning beyond discrete language tokens by performing multi-step computation in continuous latent spaces. Although there have been numerous studies focusing on improving the performance of latent reasoning, its internal mechanisms remain not fully investigated. In this work, we conduct a comprehensive analysis of latent reasoning methods to better understand the role and behavior of latent representation in the process. We identify two key issues across latent reasoning methods with different levels of supervision. First, we observe pervasive shortcut behavior, where they achieve high accuracy without relying on latent reasoning. Second, we examine the hypothesis that latent reasoning supports BFS-like exploration in latent space, and find that while latent representations can encode multiple possibilities, the reasoning process does not faithfully implement structured search, but instead exhibits implicit pruning and compression. Finally, our findings reveal a trade-off associated with supervision strength: stronger supervision mitigates shortcut behavior but restricts the ability of latent representations to maintain diverse hypotheses, whereas weaker supervision allows richer latent representations at the cost of increased shortcut behavior.
Related papers
- Imagination Helps Visual Reasoning, But Not Yet in Latent Space [65.80396132375571]
We investigate the validity of latent reasoning using Causal Mediation Analysis.<n>We show that latent tokens encode limited visual information and exhibit high similarity.<n>We propose a straightforward alternative named CapImagine, which teaches the model to explicitly imagine using text.
arXiv Detail & Related papers (2026-02-26T08:56:23Z) - Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization [9.193078163792427]
Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems.<n>Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning within continuous hidden states.<n>We introduce PLaT, a framework that reformulates latent reasoning as planning by fundamentally decouple reasoning from verbalization.
arXiv Detail & Related papers (2026-01-29T07:38:18Z) - Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process [66.38541693477181]
We propose an unsupervised framework for discovering reasoning vectors, which we define as directions in the activation space that encode distinct reasoning behaviors.<n>By segmenting chain-of-thought traces into sentence-level'steps', we uncover disentangled features corresponding to interpretable behaviors such as reflection and backtracking.<n>We demonstrate the ability to control response confidence by identifying confidence-related vectors in the SAE decoder space.
arXiv Detail & Related papers (2025-12-30T05:09:11Z) - Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks [54.31998314008198]
Large Language Models (LLMs) excel in reasoning tasks requiring a single correct answer, but they perform poorly in multi-solution tasks.<n>We attribute this limitation to textbfreasoning overconfidence: a tendency to express undue certainty in an incomplete solution set.<n>We propose the textbfcognitive-rigidity hypothesis, which posits that overconfidence arises when the reasoning process prematurely converges on a narrow set of thought paths.
arXiv Detail & Related papers (2025-12-01T14:35:06Z) - ActivationReasoning: Logical Reasoning in Latent Activation Spaces [43.17973499652433]
Large language models (LLMs) excel at generating fluent text, but their internal reasoning remains opaque and difficult to control.<n>We introduce ActivationReasoning (AR), a framework that embeds explicit logical reasoning into the latent space of LLMs.<n>AR scales robustly with reasoning complexity, generalizes to abstract and context-sensitive tasks, and transfers across model backbones.
arXiv Detail & Related papers (2025-10-21T00:21:04Z) - Latent Reasoning in LLMs as a Vocabulary-Space Superposition [80.01651003144282]
Large language models (LLMs) demonstrate strong reasoning abilities with chain-of-thought prompting, but explicit reasoning introduces substantial computational overhead.<n>Recent work on latent reasoning reduces this cost by reasoning in latent space without explicit supervision, but performance drops significantly.<n>To address this, we restrict the latent space to the column space of the LLM vocabulary, treating latent reasoning as a superposition over vocabulary probabilities.<n>Once latent reasoning concludes, it collapses into an eigenstate of explicit reasoning to yield the final answer.<n>Latent-SFT sets a new state of the art on GSM8k, matching explicit
arXiv Detail & Related papers (2025-10-17T10:51:20Z) - A Survey on Latent Reasoning [100.54120559169735]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities.<n>CoT reasoning that verbalizes intermediate steps limits the model's expressive bandwidth.<n>Latent reasoning tackles this bottleneck by performing multi-step inference entirely in the model's continuous hidden state.
arXiv Detail & Related papers (2025-07-08T17:29:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.