Breaking the Overscaling Curse: Thinking Parallelism Before Parallel Thinking
- URL: http://arxiv.org/abs/2601.21619v1
- Date: Thu, 29 Jan 2026 12:22:45 GMT
- Title: Breaking the Overscaling Curse: Thinking Parallelism Before Parallel Thinking
- Authors: Yiming Wang, Zhuosheng Zhang, Rui Wang,
- Abstract summary: In system-level evaluations, a global parallelism level N is allocated to all samples, typically set large to maximize overall dataset accuracy.<n>Some samples can achieve comparable performance with a smaller N' N, causing budget redundancy.<n>We formalize and quantify the overscaling curse, showing its universality and severity in practice, and analyze its trigger mechanism.
- Score: 14.561556728044918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parallel thinking enhances LLM reasoning by multi-path sampling and aggregation. In system-level evaluations, a global parallelism level N is allocated to all samples, typically set large to maximize overall dataset accuracy. However, due to sample heterogeneity, some samples can achieve comparable performance with a smaller N'< N, causing budget redundancy. This incompatibility between system-level efficacy and sample-level efficiency constitutes the overscaling curse. In this paper, we formalize and quantify the overscaling curse, showing its universality and severity in practice, and analyze its trigger mechanism. We then propose a lightweight method, T2, to break the overscaling curse, which utilizes latent representations to estimate the optimal parallelism level for each sample before decoding. Experiments show that T2 significantly reduces cost while maintaining comparable performance, enabling more efficient parallel thinking.
Related papers
- Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations [121.39938773554523]
The Area Under the ROC Curve (AUC) is a pivotal evaluation metric in real-world scenarios with both class imbalance and decision constraints.<n>We present two simple instance-wise minimax reformulations to close the approximation gap of PAUC optimization.<n>The resulting algorithms enjoy a linear per-iteration computational complexity w.r.t. the sample size and a convergence rate of $O(-2/3)$ for typical one-way and two-way PAUCs.
arXiv Detail & Related papers (2025-12-01T02:52:33Z) - DeepPrune: Parallel Scaling without Inter-trace Redundancy [53.62015294143274]
Over 80% of parallel reasoning traces yield identical final answers, representing substantial wasted computation.<n>We propose DeepPrune, a novel framework that enables efficient parallel scaling through dynamic pruning.<n>Our work establishes a new standard for efficient parallel reasoning, making high-performance reasoning more efficient.
arXiv Detail & Related papers (2025-10-09T17:24:54Z) - EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving [64.15371139980802]
Large Language Models (LLMs) have recently advanced the field of Automated Theorem Proving (ATP)<n>We show that different test-time scaling strategies for ATP models introduce significant computational overhead for inference.<n>We propose two complementary methods that can be integrated into a unified EconRL pipeline for amplified benefits.
arXiv Detail & Related papers (2025-09-16T03:00:13Z) - Two-dimensional Parallel Tempering for Constrained Optimization [0.3068068202044424]
We introduce a two-dimensional extension of the powerful parallel tempering algorithm (PT)<n>The resulting two-dimensional parallel tempering algorithm (2D-PT) improves mixing in heavily constrained replicas.<n>The method applies broadly to constrained Ising problems and can be deployed on existing Ising machines.
arXiv Detail & Related papers (2025-05-24T20:41:45Z) - Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z) - Seesaw: High-throughput LLM Inference via Model Re-sharding [8.840996987380484]
We present Seesaw, an inference engine optimized for throughput-oriented tasks.<n>Key idea behind Seesaw is dynamic model re-sharding, a technique that facilitates the dynamic reconfiguration of parallelization strategies.
arXiv Detail & Related papers (2025-03-09T04:14:06Z) - Parallel Simulation for Log-concave Sampling and Score-based Diffusion Models [55.07411490538404]
We propose a novel parallel sampling method that improves adaptive complexity dependence on dimension $d$.<n>Our approach builds on parallel simulation techniques from scientific computing.
arXiv Detail & Related papers (2024-12-10T11:50:46Z) - Sample-Efficient "Clustering and Conquer" Procedures for Parallel Large-Scale Ranking and Selection [3.913403111891027]
We modify the commonly used "divide and conquer" framework in parallel computing by adding a correlation-based clustering step.<n>This seemingly simple modification yields an $mathcalO(p)$ reduction in sample complexity for a widely used class of sample-optimal R&S procedures.<n>In large-scale AI applications such as neural architecture search, our methods demonstrate superior performance.
arXiv Detail & Related papers (2024-02-03T15:56:03Z) - Faster federated optimization under second-order similarity [36.66443669044588]
Federated learning (FL) is a subfield of machine learning where multiple clients try to collaboratively learn a model over a network.
We consider finite-sum federated optimization under a second-order function similarity condition and strong convexity.
We propose two new algorithms: SVRP and Catalyzed SVRP.
arXiv Detail & Related papers (2022-09-06T07:09:21Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.