Related papers: Parallel Test-Time Scaling for Latent Reasoning Models

Parallel Test-Time Scaling for Latent Reasoning Models

URL: http://arxiv.org/abs/2510.07745v1
Date: Thu, 09 Oct 2025 03:33:00 GMT
Title: Parallel Test-Time Scaling for Latent Reasoning Models
Authors: Runyang You, Yongqi Li, Meng Liu, Wenjie Wang, Liqiang Nie, Wenjie Li,
Abstract summary: Parallel test-time scaling (TTS) is a pivotal approach for enhancing large language models (LLMs)<n>Recent advances in latent reasoning, where intermediate reasoning unfolds in continuous vector spaces, offer a more efficient alternative to explicit Chain-of-Thought.<n>This work enables parallel TTS for latent reasoning models by addressing the above issues.
Score: 58.428340345068214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Parallel test-time scaling (TTS) is a pivotal approach for enhancing large language models (LLMs), typically by sampling multiple token-based chains-of-thought in parallel and aggregating outcomes through voting or search. Recent advances in latent reasoning, where intermediate reasoning unfolds in continuous vector spaces, offer a more efficient alternative to explicit Chain-of-Thought, yet whether such latent models can similarly benefit from parallel TTS remains open, mainly due to the absence of sampling mechanisms in continuous space, and the lack of probabilistic signals for advanced trajectory aggregation. \ This work enables parallel TTS for latent reasoning models by addressing the above issues. For sampling, we introduce two uncertainty-inspired stochastic strategies: Monte Carlo Dropout and Additive Gaussian Noise. For aggregation, we design a Latent Reward Model (LatentRM) trained with step-wise contrastive objective to score and guide latent reasoning. Extensive experiments and visualization analyses show that both sampling strategies scale effectively with compute and exhibit distinct exploration dynamics, while LatentRM enables effective trajectory selection. Together, our explorations open a new direction for scalable inference in continuous spaces. Code released at https://github.com/YRYangang/LatentTTS.

Related papers

GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler [54.10960908347221]
We model latent thought exploration as conditional sampling from learnable densities and instantiate this idea as a Gaussian Thought Sampler (GTS)<n>GTS predicts context-dependent perturbation distributions over continuous reasoning states and is trained with GRPO-style policy optimization while keeping the backbone frozen.
arXiv Detail & Related papers (2026-02-15T09:57:47Z)
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling [85.590774707406]
Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs.<n>We introduce UniT, a framework for multimodal test-time scaling that enables a single unified model to reason, verify, and refine across multiple rounds.
arXiv Detail & Related papers (2026-02-12T18:59:49Z)
Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space [13.537748779869615]
We introduce Latent Interacting Particle Systems, a model class parameterizing the generator of each Markov chain in the system.<n>We demonstrate the effectiveness of our approach on a challenging posterior inference task for a latent SIRS model on a graph, and on a neural model for wildfire spread dynamics trained on real data.
arXiv Detail & Related papers (2025-10-14T18:42:12Z)
CTRLS: Chain-of-Thought Reasoning via Latent State-Transition [57.51370433303236]
Chain-of-thought (CoT) reasoning enables large language models to break down complex problems into interpretable intermediate steps.<n>We introduce groundingS, a framework that formulates CoT reasoning as a Markov decision process (MDP) with latent state transitions.<n>We show improvements in reasoning accuracy, diversity, and exploration efficiency across benchmark reasoning tasks.
arXiv Detail & Related papers (2025-07-10T21:32:18Z)
Continuous Chain of Thought Enables Parallel Exploration and Reasoning [39.37806940098749]
Chain-of-thought with continuously-valued tokens (CoT2) is motivated by logical reasoning tasks that inherently require search capabilities.<n>We show how CoT2 facilitates the model to track multiple discrete traces in parallel.<n>We also provide a CoT2-based one-layer transformer that solves the "subset sum problem" given a sufficient embedding dimension.
arXiv Detail & Related papers (2025-05-29T16:58:28Z)
Hybrid Latent Reasoning via Reinforcement Learning [51.06635386903026]
We explore latent reasoning by leveraging the capabilities of large language models (LLMs) via reinforcement learning (RL)<n>We introduce hybrid reasoning policy optimization (HRPO), an RL-based hybrid latent reasoning approach that integrates prior hidden states into sampled tokens with a learnable gating mechanism.<n>HRPO-trained LLMs remain interpretable and exhibit intriguing behaviors like cross-lingual patterns and shorter completion lengths.
arXiv Detail & Related papers (2025-05-24T01:26:16Z)
Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z)
To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning [37.179289850042764]
Backtracking naturally scales test-time compute by enabling sequential, linearized exploration via long chain-of-thought (CoT) generation.<n>In this paper, we systematically compare these two approaches on two challenging reasoning tasks: CountDown and Sudoku.<n>Surprisingly, we find that sequential search underperforms parallel sampling on CountDown but outperforms it on Sudoku, suggesting that backtracking is not universally beneficial.
arXiv Detail & Related papers (2025-04-09T17:12:49Z)
Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation [40.861314212279474]
We study inference-time compute by viewing chain-of-thought (CoT) generation as a metastable Markov process.<n>We prove that implementing a search protocol that rewards sparse edges improves CoT by decreasing the expected number of steps to reach different clusters.<n>We also show that the information gained by search can be utilized to obtain a better reasoning model.
arXiv Detail & Related papers (2025-02-02T18:19:14Z)
Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs [63.36637269634553]
We introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step.<n>We show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales.<n>Our work is also significant because both quantitative analyses and manual evaluations reveal the observed gains stem from the models' ability to refine an initial reasoning chain.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.