PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
- URL: http://arxiv.org/abs/2601.05593v1
- Date: Fri, 09 Jan 2026 07:24:43 GMT
- Title: PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
- Authors: Jingcheng Hu, Yinmin Zhang, Shijie Shang, Xiaobo Yang, Yue Peng, Zhewei Huang, Hebin Zhou, Xin Wu, Jie Cheng, Fanqi Wan, Xiangwen Kong, Chengyuan Yao, Kaiwen Yan, Ailin Huang, Hongyu Zhou, Qi Han, Zheng Ge, Daxin Jiang, Xiangyu Zhang, Heung-Yeung Shum,
- Abstract summary: PaCoRe is a training-and-inference framework designed to overcome a central limitation of contemporary language models.<n>PaCoRe departs from the traditional sequential paradigm by driving TTC through massive parallel exploration coordinated via a message-passing architecture in multiple rounds.<n>Trained end-to-end with large-scale, outcome-based reinforcement learning, the model masters the synthesis abilities required by PaCoRe and scales to multi-million-token effective TTC without exceeding context limits.
- Score: 62.55836470496192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Parallel Coordinated Reasoning (PaCoRe), a training-and-inference framework designed to overcome a central limitation of contemporary language models: their inability to scale test-time compute (TTC) far beyond sequential reasoning under a fixed context window. PaCoRe departs from the traditional sequential paradigm by driving TTC through massive parallel exploration coordinated via a message-passing architecture in multiple rounds. Each round launches many parallel reasoning trajectories, compacts their findings into context-bounded messages, and synthesizes these messages to guide the next round and ultimately produce the final answer. Trained end-to-end with large-scale, outcome-based reinforcement learning, the model masters the synthesis abilities required by PaCoRe and scales to multi-million-token effective TTC without exceeding context limits. The approach yields strong improvements across diverse domains, and notably pushes reasoning beyond frontier systems in mathematics: an 8B model reaches 94.5% on HMMT 2025, surpassing GPT-5's 93.2% by scaling effective TTC to roughly two million tokens. We open-source model checkpoints, training data, and the full inference pipeline to accelerate follow-up work.
Related papers
- D-COT: Disciplined Chain-of-Thought Learning for Efficient Reasoning in Small Language Models [0.0]
Disciplined Chain-of-Thought (D-CoT) is a framework that enforces a structured reasoning process using control tags.<n>D-CoT suppresses reasoning drift and simultaneously achieves token reduction and performance improvement.
arXiv Detail & Related papers (2026-02-25T11:08:38Z) - ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models [99.6720868215076]
We introduce ThreadWeaver, a framework for adaptive parallel reasoning.<n> ThreadWeaver achieves accuracy on par with popular sequential reasoning models of comparable size.<n>We show that ThreadWeaver delivers up to 1.53x average speedup in token latency.
arXiv Detail & Related papers (2025-11-24T18:55:59Z) - Rethinking Thinking Tokens: LLMs as Improvement Operators [80.12087211785949]
Reasoning training incentivizes LLMs to produce long chains of thought (long CoT), which allows them to explore solution strategies with self-checking.<n>This results in higher accuracy, but inflates context length, token/compute cost, and answer latency.<n>We ask: Can current models leverage their metacognition to provide other combinations on this Pareto frontier?<n>We identify an interesting inference family Parallel-Distill-Refine (PDR), which performs the following: (i) generate diverse drafts in parallel; (ii) distill them into a bounded, textual workspace; and (iii) refine conditioned on this workspace
arXiv Detail & Related papers (2025-10-01T17:08:59Z) - Parallel-R1: Towards Parallel Thinking via Reinforcement Learning [65.68667585027232]
Parallel thinking is a novel approach for enhancing the reasoning capabilities of large language models.<n>We propose textbfParallel-R1, the first reinforcement learning framework that enables parallel thinking behaviors.<n>Our framework employs a progressive curriculum that explicitly addresses the cold-start problem in training parallel thinking.
arXiv Detail & Related papers (2025-09-09T17:59:35Z) - DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference [19.987309147268586]
We present DeltaLLM, a training-free framework that exploits temporal sparsity in attention patterns to enable efficient LLM inference on resource-constrained edge devices.<n>We evaluate our framework on the edge-device-friendly BitNet-b1.58-2B-4T model and Llama3.2-1B-Instruct model across diverse language tasks.
arXiv Detail & Related papers (2025-07-25T18:23:18Z) - Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations [8.725115559974999]
We show that existing work for long context inference is largely based on techniques from long context training.<n>This leads to inefficient resource utilization, server fragmentation, and head-of-line (HOL) blocking.<n>We present Medha, an end-to-end system for efficient long-context inference.
arXiv Detail & Related papers (2024-09-25T18:21:05Z) - Pre-training for Speech Translation: CTC Meets Optimal Transport [29.807861658249923]
We show that the connectionist temporal classification (CTC) loss can reduce the modality gap by design.
We propose a novel pre-training method combining CTC and optimal transport to further reduce this gap.
Our method pre-trains a Siamese-like model composed of two encoders, one for acoustic inputs and the other for textual inputs, such that they produce representations that are close to each other in the Wasserstein space.
arXiv Detail & Related papers (2023-01-27T14:03:09Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - Training Recommender Systems at Scale: Communication-Efficient Model and
Data Parallelism [56.78673028601739]
We propose a compression framework called Dynamic Communication Thresholding (DCT) for communication-efficient hybrid training.
DCT reduces communication by at least $100times$ and $20times$ during DP and MP, respectively.
It improves end-to-end training time for a state-of-the-art industrial recommender model by 37%, without any loss in performance.
arXiv Detail & Related papers (2020-10-18T01:44:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.