Related papers: Gold-Switch: Training-Free Superposition of Slow- and Fast- Thinking LLMs

Gold-Switch: Training-Free Superposition of Slow- and Fast- Thinking LLMs

URL: http://arxiv.org/abs/2510.06750v1
Date: Wed, 08 Oct 2025 08:17:57 GMT
Title: Gold-Switch: Training-Free Superposition of Slow- and Fast- Thinking LLMs
Authors: Jaeseong Lee, Dayoung Kwon, seung-won hwang,
Abstract summary: Large Reasoning Models (LRMs) excel in structured tasks by emulating deliberate human reasoning but often suffer from overthinking.<n>We propose a superposed deployment strategy with a lightweight, training-free regulation to optimize switching inference by one model on and off.
Score: 36.84838904299283
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Reasoning Models (LRMs) excel in structured tasks by emulating deliberate human reasoning but often suffer from overthinking, degrading performance and wasting resources. One possible baseline is to deploy both LLM and LRM, then route input by predicting whether it requires reasoning and may cause overthinking. However, deploying multiple models can be costly or impractical. We propose a superposed deployment strategy with a lightweight, training-free regulation to optimize inference by switching one model on and off. Instead of routing, we selectively unlearn from LRM at inference, scaling down computation while preserving reasoning. By analyzing the cumulative energy of singular values, we identify optimal low-rank projections to adjust reasoning just right.

Related papers

Reasoning Pattern Alignment Merging for Adaptive Reasoning [48.347817456299104]
Reasoning Pattern Alignment Merging (RPAM)<n>RPAM is a layer-wise model merging framework based on feature alignment to facilitate query-adaptive reasoning.<n> Experiments on seven widely used reasoning benchmarks show that RPAM substantially reduces inference cost while maintaining strong performance.
arXiv Detail & Related papers (2026-01-07T01:36:39Z)
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads [104.9566359759396]
We propose a lightweight alternative for step-level reasoning verification based on data-driven uncertainty scores.<n>Our findings suggest that the internal states of LLMs encode their uncertainty and can serve as reliable signals for reasoning verification.
arXiv Detail & Related papers (2025-11-09T03:38:29Z)
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs [61.90251858867122]
Thinking LLMs solve complex tasks at the expense of increased compute and overthinking on simpler problems.<n>Non-thinking LLMs are faster and cheaper but underthink on harder reasoning problems.<n>We introduce OptimalThinkingBench, a unified benchmark that jointly evaluates overthinking and underthinking in LLMs.
arXiv Detail & Related papers (2025-08-18T17:53:10Z)
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models [67.87579664988199]
TON is a two-stage training strategy for vision-language models (VLMs)<n>It introduces a think-or-not format that serves as a cold start for selective reasoning.<n>TON can reduce the completion length by up to 90% compared to vanilla GRPO.
arXiv Detail & Related papers (2025-05-22T16:13:29Z)
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning [55.33984461046492]
Policy-based methods currently dominate reinforcement learning pipelines for large language model (LLM) reasoning.<n>We introduce Trajectory Bellman Residual Minimization (TBRM), an algorithm that naturally adapts this idea to LLMs.<n>We prove convergence to the near-optimal KL-regularized policy from arbitrary off-policy via an improved change-of-trajectory-measure analysis.
arXiv Detail & Related papers (2025-05-21T09:41:53Z)
Thinkless: LLM Learns When to Think [57.857534644932194]
Reasoning Language Models, capable of extended chain-of-thought reasoning, have demonstrated remarkable performance on tasks requiring complex logical inference.<n>We propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short-form and long-form reasoning.<n>On several benchmarks such as Minerva Algebra, MATH-500, and GSM8K, Thinkless is able to reduce the usage of long-chain thinking by 50% - 90%.
arXiv Detail & Related papers (2025-05-19T17:24:16Z)
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL [36.40577746211243]
Large reasoning models (LRMs) are proficient at generating explicit, step-by-step reasoning sequences before producing final answers.<n>To address this over-thinking problem, we explore how to equip LRMs with adaptive thinking capabilities.<n>We propose AutoThink, a multi-stage reinforcement learning framework that progressively optimize reasoning policies.
arXiv Detail & Related papers (2025-05-16T04:01:57Z)
Scalable Chain of Thoughts via Elastic Reasoning [61.75753924952059]
Elastic Reasoning is a novel framework for scalable chain of thoughts.<n>It separates reasoning into two phases--thinking and solution--with independently allocated budgets.<n>Our approach produces more concise and efficient reasoning even in unconstrained settings.
arXiv Detail & Related papers (2025-05-08T15:01:06Z)
Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models [32.49420948390984]
Large reasoning models (LRMs) often suffer from an overthinking'' problem, where the model generates excessively redundant reasoning steps with limited performance gains.<n>We propose a simple yet efficient pipeline, Method, to enable LRMs to bypass unnecessary intermediate steps, thereby significantly reducing computational costs.
arXiv Detail & Related papers (2025-04-18T11:07:19Z)
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks.<n>We propose a novel approach for continuous-space reasoning that does not require modifying the LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.