Latent Reasoning in LLMs as a Vocabulary-Space Superposition
- URL: http://arxiv.org/abs/2510.15522v1
- Date: Fri, 17 Oct 2025 10:51:20 GMT
- Title: Latent Reasoning in LLMs as a Vocabulary-Space Superposition
- Authors: Jingcheng Deng, Liang Pang, Zihao Wei, Shichen Xu, Zenghao Duan, Kun Xu, Yang Song, Huawei Shen, Xueqi Cheng,
- Abstract summary: Large language models (LLMs) demonstrate strong reasoning abilities with chain-of-thought prompting, but explicit reasoning introduces substantial computational overhead.<n>Recent work on latent reasoning reduces this cost by reasoning in latent space without explicit supervision, but performance drops significantly.<n>To address this, we restrict the latent space to the column space of the LLM vocabulary, treating latent reasoning as a superposition over vocabulary probabilities.<n>Once latent reasoning concludes, it collapses into an eigenstate of explicit reasoning to yield the final answer.<n>Latent-SFT sets a new state of the art on GSM8k, matching explicit
- Score: 80.01651003144282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) demonstrate strong reasoning abilities with chain-of-thought prompting, but explicit reasoning introduces substantial computational overhead. Recent work on latent reasoning reduces this cost by reasoning in latent space without explicit supervision, but performance drops significantly. Our preliminary experiments suggest that this degradation stems from the unstructured latent space, which makes fitting latent tokens difficult. To address this, we restrict the latent space to the column space of the LLM vocabulary, treating latent reasoning as a superposition over vocabulary probabilities. Once latent reasoning concludes, it collapses into an eigenstate of explicit reasoning to yield the final answer. Based on this idea, we propose Latent-SFT, a two-stage learning framework. In the first stage, we design two specialized attention masks to guide the Latent Token Encoder in generating latent tokens, allowing the LLM to produce the correct answer conditioned on them. In the second stage, the Latent Token Encoder is discarded, and the LLM is directly trained to generate these latent tokens autonomously for latent reasoning, optimized with KL and CE losses. Latent-SFT sets a new state of the art on GSM8k, matching explicit SFT performance while cutting reasoning chains by up to 4 times and outperforming prior latent methods. On Math500 and AIME24, lexical probability-based latent reasoning also clearly surpasses hidden-state-based approaches. Our metrics of effective compression rate and effective global parallelism further show that latent reasoning is both the compression of a single path and the superposition of multiple paths.
Related papers
- LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval [74.72139580745511]
LaSER is a novel self-distillation framework that internalizes explicit reasoning into the latent space of retrievers.<n>Our method successfully combines the reasoning depth of explicit CoT pipelines with the inference efficiency of standard dense retrievers.
arXiv Detail & Related papers (2026-03-02T04:11:18Z) - CoLT: Reasoning with Chain of Latent Tool Calls [31.228763375347608]
Chain-of-Thought (CoT) is a critical technique in enhancing the reasoning ability of Large Language Models (LLMs)<n>We propose CoLT, a novel framework that implements latent reasoning as tool calls''
arXiv Detail & Related papers (2026-02-04T06:12:53Z) - Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization [9.193078163792427]
Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems.<n>Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning within continuous hidden states.<n>We introduce PLaT, a framework that reformulates latent reasoning as planning by fundamentally decouple reasoning from verbalization.
arXiv Detail & Related papers (2026-01-29T07:38:18Z) - Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning [16.244366307890832]
We propose textbfDeepLatent Reasoning (DLR), a latent-space bidirectional contrastive reinforcement learning framework.<n>This framework shifts the trial-and-error cost from expensive token-level full sequence generation to the continuous latent manifold.<n> Experiments demonstrate that DLR achieves more stable training convergence, supports longer-horizon reasoning chains, and facilitates the sustainable accumulation of reasoning capabilities.
arXiv Detail & Related papers (2026-01-24T03:18:22Z) - LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning [30.62691333490551]
Large Language Models (LLMs) demonstrate their reasoning ability through chain-of-thought generation.<n>We propose LaDiR, a novel reasoning framework that unifies the expressiveness of continuous latent representation.<n>LaDiR consistently improves accuracy, diversity, and interpretability over existing autoregressive, diffusion-based, and latent reasoning methods.
arXiv Detail & Related papers (2025-10-06T08:15:03Z) - A Survey on Latent Reasoning [100.54120559169735]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities.<n>CoT reasoning that verbalizes intermediate steps limits the model's expressive bandwidth.<n>Latent reasoning tackles this bottleneck by performing multi-step inference entirely in the model's continuous hidden state.
arXiv Detail & Related papers (2025-07-08T17:29:07Z) - Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space [62.54887038032942]
We introduce Soft Thinking, a training-free method that emulates human-like "soft" reasoning by generating soft, abstract concept tokens.<n>These concept tokens are created by the probability-weighted mixture of token embeddings, which form the continuous concept space.<n>In essence, each generated concept token encapsulates multiple meanings from related discrete tokens, implicitly exploring various reasoning paths to converge.
arXiv Detail & Related papers (2025-05-21T17:29:15Z) - SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks.<n>We propose a novel approach for continuous-space reasoning that does not require modifying the LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z) - Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [53.57895922042783]
Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data.<n>We propose a hybrid representation of the reasoning process, where we partially abstract away the initial reasoning steps using latent discrete tokens.
arXiv Detail & Related papers (2025-02-05T15:33:00Z) - Training Large Language Models to Reason in a Continuous Latent Space [84.5618790930725]
We introduce a new paradigm Coconut (Chain of Continuous Thought) to explore the potential of large language models (LLMs) reasoning in an unrestricted latent space.<n>Experiments show that Coconut can effectively augment the LLM on several reasoning tasks.<n>These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.
arXiv Detail & Related papers (2024-12-09T18:55:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.