Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing
- URL: http://arxiv.org/abs/2602.03845v2
- Date: Tue, 10 Feb 2026 21:56:16 GMT
- Title: Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing
- Authors: Tong Zheng, Chengsong Huang, Runpeng Dai, Yun He, Rui Liu, Xin Ni, Huiwen Bao, Kaishen Wang, Hongtu Zhu, Jiaxin Huang, Furong Huang, Heng Huang,
- Abstract summary: Parallel-Probe is a training-free controller designed to optimize online parallel thinking.<n>It reduces sequential tokens by up to $textbf35.8$% and total token cost by over $textbf25.8$% while maintaining competitive accuracy.
- Score: 76.48164395646019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parallel thinking has emerged as a promising paradigm for reasoning, yet it imposes significant computational burdens. Existing efficiency methods primarily rely on local, per-trajectory signals and lack principled mechanisms to exploit global dynamics across parallel branches. We introduce 2D probing, an interface that exposes the width-depth dynamics of parallel thinking by periodically eliciting intermediate answers from all branches. Our analysis reveals three key insights: non-monotonic scaling across width-depth allocations, heterogeneous reasoning branch lengths, and early stabilization of global consensus. Guided by these insights, we introduce $\textbf{Parallel-Probe}$, a training-free controller designed to optimize online parallel thinking. Parallel-Probe employs consensus-based early stopping to regulate reasoning depth and deviation-based branch pruning to dynamically adjust width. Extensive experiments across three benchmarks and multiple models demonstrate that Parallel-Probe establishes a superior Pareto frontier for test-time scaling. Compared to standard majority voting, it reduces sequential tokens by up to $\textbf{35.8}$% and total token cost by over $\textbf{25.8}$% while maintaining competitive accuracy.
Related papers
- Parallel Latent Reasoning for Sequential Recommendation [23.624137982116867]
We propose PLR, a novel framework for exploring multiple diverse reasoning trajectories simultaneously.<n>PLR constructs parallel reasoning streams through learnable trigger tokens in continuous latent space.<n>Experiments on three real-world datasets demonstrate that PLR substantially outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2026-01-06T16:25:48Z) - Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning [68.9332598692234]
We introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities.<n>NPR transforms the model from sequential emulation to native parallel cognition through three key innovations.
arXiv Detail & Related papers (2025-12-08T11:39:43Z) - ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models [99.6720868215076]
We introduce ThreadWeaver, a framework for adaptive parallel reasoning.<n> ThreadWeaver achieves accuracy on par with popular sequential reasoning models of comparable size.<n>We show that ThreadWeaver delivers up to 1.53x average speedup in token latency.
arXiv Detail & Related papers (2025-11-24T18:55:59Z) - Polybasic Speculative Decoding Through a Theoretical Perspective [68.71678077009386]
Inference latency is a critical bottleneck in the large-scale deployment of Large Language Models.<n>We introduce a novel emphpolybasic speculative decoding framework, underpinned by a comprehensive theoretical analysis.<n>We show that our approach yields speedup ratios ranging from $3.31times$ to $4.01times$ for LLaMA2-Chat 7B, up to $3.87 times$ for LLaMA3-8B, up to $4.43 times$ for Vicuna-7B and up to $3.85 times$ for Qwen2-7B.
arXiv Detail & Related papers (2025-10-30T14:20:24Z) - DeepPrune: Parallel Scaling without Inter-trace Redundancy [53.62015294143274]
Over 80% of parallel reasoning traces yield identical final answers, representing substantial wasted computation.<n>We propose DeepPrune, a novel framework that enables efficient parallel scaling through dynamic pruning.<n>Our work establishes a new standard for efficient parallel reasoning, making high-performance reasoning more efficient.
arXiv Detail & Related papers (2025-10-09T17:24:54Z) - Rethinking Thinking Tokens: LLMs as Improvement Operators [80.12087211785949]
Reasoning training incentivizes LLMs to produce long chains of thought (long CoT), which allows them to explore solution strategies with self-checking.<n>This results in higher accuracy, but inflates context length, token/compute cost, and answer latency.<n>We ask: Can current models leverage their metacognition to provide other combinations on this Pareto frontier?<n>We identify an interesting inference family Parallel-Distill-Refine (PDR), which performs the following: (i) generate diverse drafts in parallel; (ii) distill them into a bounded, textual workspace; and (iii) refine conditioned on this workspace
arXiv Detail & Related papers (2025-10-01T17:08:59Z) - Parallel-R1: Towards Parallel Thinking via Reinforcement Learning [65.68667585027232]
Parallel thinking is a novel approach for enhancing the reasoning capabilities of large language models.<n>We propose textbfParallel-R1, the first reinforcement learning framework that enables parallel thinking behaviors.<n>Our framework employs a progressive curriculum that explicitly addresses the cold-start problem in training parallel thinking.
arXiv Detail & Related papers (2025-09-09T17:59:35Z) - Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism [20.3565068078231]
We propose a novel framework textbfSpecBranch to unlock branch parallelism in Speculative decoding.<n>We show that SpecBranch achieves over textbf1.8$times sim$ textbf4.5$times$ speedups against the auto-regressive decoding and reduces rollback tokens by $textbf50$% for poorly aligned models.
arXiv Detail & Related papers (2025-05-16T07:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.