Related papers: Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models

Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models

URL: http://arxiv.org/abs/2511.04108v1
Date: Thu, 06 Nov 2025 06:47:39 GMT
Title: Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models
Authors: Wenmo Qiu, Saurabh Srivastava,
Abstract summary: We show that it regularizes model behavior during multi-step reasoning for Large Reasoning Models (LRMs)<n>We conduct a comprehensive study across 13 diverse benchmarks and observe that improves accuracy while substantially reducing reasoning token usage.<n>Surprisingly, we also observe emergent collective effects in batched inference: models often generalize patterns from earlier examples to solve harder ones.
Score: 5.408799241182959
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work has explored batch prompting as a strategy to amortize inference cost in large language models (LLMs). In this paper, we show that batching offers an additional, underappreciated benefit: it regularizes model behavior during multi-step reasoning for Large Reasoning Models (LRMs). We conduct a comprehensive study across 13 diverse benchmarks and observe that batching improves accuracy while substantially reducing reasoning token usage, often by 3x-5x. Through detailed behavioral analysis, we find that batching suppresses overthinking, reduces hedging language (e.g., repetitive self-corrections), and encourages more decisive answers. Surprisingly, we also observe emergent collective effects in batched inference: models often generalize patterns from earlier examples to solve harder ones in the same batch. These findings position batching not just as a throughput optimization, but as a powerful inference-time regularizer for more efficient and reliable LLM reasoning.

Related papers

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution [79.98699884805636]
Reasoning Execution by Multiple Listeners (REMUL) is a multi-party reinforcement learning approach.<n>REMUL builds on the hypothesis that reasoning traces which other parties can follow will be more faithful.<n>Speakers are rewarded for producing reasoning that is clear to listeners.
arXiv Detail & Related papers (2026-02-18T02:55:55Z)
To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks [56.11584171938381]
Theory of Mind (ToM) assesses whether models can infer hidden mental states such as beliefs, desires, and intentions.<n>Recent progress in Large Reasoning Models (LRMs) has boosted step-by-step inference in mathematics and coding.<n>We present a systematic study of nine advanced Large Language Models (LLMs) comparing reasoning models with non-reasoning models.
arXiv Detail & Related papers (2026-02-11T08:16:13Z)
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization [56.59356959631999]
Gated Perception-Reasoning Optimization (GPRO) is a meta-reasoning controller that dynamically routes computation among three decision paths.<n>GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods.
arXiv Detail & Related papers (2026-01-07T23:05:17Z)
Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models [29.56923793047279]
We introduce Dynamic Outlier Truncation (DOT), a training-time intervention that selectively suppresses redundant tokens.<n>DOT targets only the extreme tail of response lengths within fully correct rollout groups while preserving long-horizon reasoning capabilities.<n>Our method reduces inference token usage by 78% while simultaneously increasing accuracy compared to the initial policy.
arXiv Detail & Related papers (2026-01-07T14:31:07Z)
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking [50.97239453902612]
Large Reasoning Models (LRMs) have achieved impressive performance on challenging tasks, yet their deep reasoning often incurs substantial computational costs.<n>Inspired by Evidence Accumulation Models, we find that LRMs have accumulated sufficient information early in reasoning, making further reasoning steps redundant.<n>We propose Just-Enough Thinking (JET), which trains models to proactively terminate unnecessary reasoning.
arXiv Detail & Related papers (2025-09-27T16:25:06Z)
DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models [28.90035967715762]
Reasoning large language models (RLLMs) have recently demonstrated remarkable capabilities by performing structured and multi-step reasoning.<n>We propose Dynamic Reasoning Quota Allocation (DRQA), a novel method that transfers the benefits of resource competition from batch processing to single-question inference.
arXiv Detail & Related papers (2025-08-25T08:47:36Z)
Stands to Reason: Investigating the Effect of Reasoning on Idiomaticity Detection [2.8330244018167945]
We examine how reasoning capabilities in Large Language Models affect idiomaticity detection performance.<n>We find the effect of reasoning to be smaller and more varied than expected.<n>For smaller models, producing chain-of-thought (CoT) reasoning increases performance from Math-tuned intermediate models, but not to the levels of the base models.
arXiv Detail & Related papers (2025-08-18T21:17:09Z)
AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking [38.8730008545358]
Large language models (LLMs) often lack robustness in their reasoning.<n>Our approach focuses on "abstracting" reasoning problems.<n>We find that this abstraction process is better acquired through reinforcement learning (RL) than just supervised fine-tuning.
arXiv Detail & Related papers (2025-06-09T13:34:50Z)
Does Thinking More always Help? Mirage of Test-Time Scaling in Reasoning Models [130.5487886246353]
Extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance.<n>This raises a natural question: Does thinking more at test-time truly lead to better reasoning?<n>We show a consistent pattern of initial performance improvements from additional thinking followed by a decline, due to "overthinking"
arXiv Detail & Related papers (2025-06-04T17:55:09Z)
PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z)
The Price of a Second Thought: On the Evaluation of Reasoning Efficiency in Large Language Models [54.88805865447848]
We show that instruct models achieve higher efficiency overall, and problem difficulty affects efficiency.<n>We propose COTHINK, a simple two-stage pipeline: an instruct model drafts a brief outline, and a thinking model expands it.<n>On GSM8K, MATH500, and AIME24, COTHINK cuts token usage by 21.1% while keeping accuracy on four thinking models, and remains competitive with strong efficiency baselines.
arXiv Detail & Related papers (2025-05-28T06:24:45Z)
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods [39.89239733570008]
This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models.<n>We find that non-reasoning models, even with an extremely high inference budget, still fall substantially behind reasoning models.<n>For reasoning models, majority voting proves to be a robust inference strategy, generally competitive or outperforming other more sophisticated ITC methods.
arXiv Detail & Related papers (2025-04-18T19:32:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.