Related papers: Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

URL: http://arxiv.org/abs/2601.21358v2
Date: Wed, 04 Feb 2026 16:14:48 GMT
Title: Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization
Authors: Jiecong Wang, Hao Peng, Chunyang Liu,
Abstract summary: Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems.<n>Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning within continuous hidden states.<n>We introduce PLaT, a framework that reformulates latent reasoning as planning by fundamentally decouple reasoning from verbalization.
Score: 9.193078163792427
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems, but remains constrained by the computational cost and reasoning path collapse when grounded in discrete token spaces. Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning within continuous hidden states. However, these methods typically operate as opaque end-to-end mappings from explicit reasoning steps to latent states, and often require a pre-defined number of latent steps during inference. In this work, we introduce PLaT (Planning with Latent Thoughts), a framework that reformulates latent reasoning as planning by fundamentally decouple reasoning from verbalization. We model reasoning as a deterministic trajectory of latent planning states, while a separate Decoder grounds these thoughts into text when necessary. This decoupling allows the model to dynamically determine when to terminate reasoning rather than relying on fixed hyperparameters. Empirical results on mathematical benchmarks reveal a distinct trade-off: while PLaT achieves lower greedy accuracy than baselines, it demonstrates superior scalability in terms of reasoning diversity. This indicates that PLaT learns a robust, broader solution space, offering a transparent and scalable foundation for inference-time search. Our code can be found in https://github.com/yunsaijc/PLaT.

Related papers

CoLT: Reasoning with Chain of Latent Tool Calls [31.228763375347608]
Chain-of-Thought (CoT) is a critical technique in enhancing the reasoning ability of Large Language Models (LLMs)<n>We propose CoLT, a novel framework that implements latent reasoning as tool calls''
arXiv Detail & Related papers (2026-02-04T06:12:53Z)
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought [49.203970812338916]
Explicit reasoning chains introduce substantial computational redundancy.<n>Recent latent reasoning methods attempt to mitigate this by compressing reasoning processes into latent space.<n>We propose Rendered CoT-Guided variational Latent Reasoning (ReGuLaR)
arXiv Detail & Related papers (2026-01-30T17:08:06Z)
SpiralThinker: Latent Reasoning through an Iterative Process with Text-Latent Interleaving [4.732347368043908]
SpiralThinker is a unified framework that performs iterative updates over latent representations.<n>A progressive alignment objective combined with structured annotations maintains coherence between latent and textual reasoning.
arXiv Detail & Related papers (2025-11-12T05:05:42Z)
Think Consistently, Reason Efficiently: Energy-Based Calibration for Implicit Chain-of-Thought [33.267497114389734]
Large Language Models (LLMs) have demonstrated strong reasoning capabilities through emphChain-of-Thought (CoT) prompting.<n>CoT methods rely on discrete token-level reasoning processes prone to error propagation and limited by vocabulary.<n>We propose EBM-CoT, an Energy-Based Chain-of-Thought framework that refines latent thought representations through an energy-based model.
arXiv Detail & Related papers (2025-11-10T14:10:58Z)
Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization [5.674809920704963]
Latent Thought Policy Optimization enhances LLM reasoning entirely at test time.<n>Experiments show that LTPO not only matches or surpasses strong baselines on standard tasks but also demonstrates remarkable robustness where others fail.<n>Most notably, on highly challenging AIME benchmarks where existing latent reasoning baselines collapse to near-zero accuracy, LTPO delivers substantial improvements.
arXiv Detail & Related papers (2025-10-05T12:50:39Z)
Stop Spinning Wheels: Mitigating LLM Overthinking via Mining Patterns for Early Reasoning Exit [114.83867400179354]
Overthinking can degrade overall performance of large language models.<n>We categorize reasoning into three stages: insufficient exploration stage, compensatory reasoning stage, and reasoning convergence stage.<n>We develop a lightweight thresholding strategy based on rules to improve reasoning accuracy.
arXiv Detail & Related papers (2025-08-25T03:17:17Z)
A Survey on Latent Reasoning [100.54120559169735]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities.<n>CoT reasoning that verbalizes intermediate steps limits the model's expressive bandwidth.<n>Latent reasoning tackles this bottleneck by performing multi-step inference entirely in the model's continuous hidden state.
arXiv Detail & Related papers (2025-07-08T17:29:07Z)
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer [0.8738725605667471]
Chain-of-thought (CoT) reasoning has enabled transformer-based language models to excel at complex mathematics and multi-step planning.<n>In standard decoder-only architectures, these reasoning steps are externalized in natural language, improving interpretability at the cost of efficiency.<n>We investigate whether such reasoning structures emerge in Huginn-3.5B, a depth-recurrent Transformer that reuses layers at inference time without increasing parameter count.
arXiv Detail & Related papers (2025-07-02T23:35:21Z)
Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z)
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [64.74765550805024]
Chain-of-Thought prompting elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs.<n>We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints.<n>SoT achieves token reductions of up to 84% with minimal accuracy loss across 18 reasoning datasets.
arXiv Detail & Related papers (2025-03-07T06:57:17Z)
Training Large Language Models to Reason in a Continuous Latent Space [71.0274000348354]
We introduce a new paradigm called Coconut (Chain of Continuous Thought) to explore the potential of reasoning beyond language.<n>Instead of decoding this state into words, we feed it back to the model as the next input embedding directly in the continuous space.<n>This latent reasoning paradigm enables an advanced reasoning pattern, where continuous thoughts can encode multiple alternative next steps.
arXiv Detail & Related papers (2024-12-09T18:55:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.