Base Models Know How to Reason, Thinking Models Learn When
- URL: http://arxiv.org/abs/2510.07364v3
- Date: Wed, 22 Oct 2025 16:02:22 GMT
- Title: Base Models Know How to Reason, Thinking Models Learn When
- Authors: Constantin Venhoff, Iván Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda,
- Abstract summary: Despite consistent performance gains, it remains unclear to what extent thinking models learn entirely new reasoning capabilities.<n>We propose a hybrid model where we activate reasoning mechanisms in base models at the right time to elicit thinking-model-level reasoning chains.
- Score: 29.101045366810652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Why do thinking language models like DeepSeek R1 outperform their base counterparts? Despite consistent performance gains, it remains unclear to what extent thinking models learn entirely new reasoning capabilities or repurpose pre-existing base model ones. In this work, we propose a hybrid model where we activate reasoning mechanisms in base models at the right time to elicit thinking-model-level reasoning chains, implying that thinking models exploit already existing capabilities. To ground our analysis, we introduce an unsupervised, bottom-up approach for uncovering human-interpretable reasoning behaviors in thinking models. This approach provides an unbiased method to discover reasoning behaviors without imposing manual or LLM-derived assumptions. Across three base and four thinking models, using GSM8K and MATH500, our hybrid model recovers up to 91% of the performance gap to thinking models without any weight updates while steering only 12% of tokens. Concretely, our empirical setup provides a simple, causal way to test the effectiveness of existing reasoning mechanisms in base models by invoking them directly and measuring the resulting task performance. More broadly, these results reframe our understanding of how thinking models are trained: pre-training is when models acquire most of their reasoning mechanisms, and post-training teaches efficient deployment of these mechanisms at the right time, enabling efficient use of their inference-time compute.
Related papers
- EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs [9.412828452977553]
Existing approaches reinforce successful reasoning paths, incurring a substantial calibration cost.<n>This failure has been characterized as a form of model collapse in alignment.<n>We proposeEpiCaR as a training objective that jointly optimize reasoning performance and calibration.
arXiv Detail & Related papers (2026-01-11T06:21:13Z) - Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models [49.598776427454176]
Large Reasoning Models (LRMs) have gradually become a research hotspot due to their outstanding performance in handling complex tasks.<n>However, with the widespread application of these models, the problem of overthinking has gradually emerged.<n>Various efficient reasoning methods have been proposed, aiming to reduce the length of reasoning paths without compromising model performance and reasoning capability.
arXiv Detail & Related papers (2025-08-04T06:54:31Z) - Large Reasoning Models are not thinking straight: on the unreliability of thinking trajectories [0.0]
Large Language Models (LLMs) trained via Reinforcement Learning (RL) have recently achieved impressive results on reasoning benchmarks.<n>Yet, growing evidence shows that these models often generate longer but ineffective chains of thought (CoTs)<n>We present new evidence of overthinking, where models disregard correct solutions even when explicitly provided, instead continuing to generate unnecessary reasoning steps.
arXiv Detail & Related papers (2025-07-01T12:14:22Z) - Incentivizing LLMs to Self-Verify Their Answers [22.387551134333084]
We propose a framework that incentivizes Large Language Models to self-verify their own answers.<n>We train our self-verification models based on Qwen2.5-Math-7B and DeepSeek-R1-Distill-Qwen-1.5B.<n> Experiments on multiple mathematical reasoning benchmarks show that our models can not only improve post-training performance but also enable effective test-time scaling.
arXiv Detail & Related papers (2025-06-02T06:54:29Z) - Can Large Reasoning Models Self-Train? [51.0277533541394]
We use majority voting as a simple self-feedback mechanism to study whether self-training can be sustained within reinforcement learning.<n>We find that this basic approach improves not only the model's reasoning performance, but also its capability of generating better quality feedback for the next RL iteration.<n>Yet our analysis also reveals a critical limitation of such a self-training paradigm - prolonged RL with self-reward leads to reward hacking, resulting in sudden and complete performance collapse.
arXiv Detail & Related papers (2025-05-27T17:16:00Z) - Self-supervised Analogical Learning using Language Models [59.64260218737556]
We propose SAL, a self-supervised analogical learning framework.<n> SAL mimics the human analogy process and trains models to explicitly transfer high-quality symbolic solutions.<n>We show that the resulting models outperform base language models on a wide range of reasoning benchmarks.
arXiv Detail & Related papers (2025-02-03T02:31:26Z) - Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models.
We first propose AutoMathCritique, an automated and scalable framework for collecting critique data.
We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z) - Prototypical Self-Explainable Models Without Re-training [5.837536154627278]
Self-explainable models (SEMs) are trained directly to provide explanations alongside their predictions.
Current SEMs require complex architectures and heavily regularized loss functions, thus necessitating specific and costly training.
We propose a simple yet efficient universal method called KMEx, which can convert any existing pre-trained model into a prototypical SEM.
arXiv Detail & Related papers (2023-12-13T01:15:00Z) - Minimal Value-Equivalent Partial Models for Scalable and Robust Planning
in Lifelong Reinforcement Learning [56.50123642237106]
Common practice in model-based reinforcement learning is to learn models that model every aspect of the agent's environment.
We argue that such models are not particularly well-suited for performing scalable and robust planning in lifelong reinforcement learning scenarios.
We propose new kinds of models that only model the relevant aspects of the environment, which we call "minimal value-minimal partial models"
arXiv Detail & Related papers (2023-01-24T16:40:01Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Thief, Beware of What Get You There: Towards Understanding Model
Extraction Attack [13.28881502612207]
In some scenarios, AI models are trained proprietarily, where neither pre-trained models nor sufficient in-distribution data is publicly available.
We find the effectiveness of existing techniques significantly affected by the absence of pre-trained models.
We formulate model extraction attacks into an adaptive framework that captures these factors with deep reinforcement learning.
arXiv Detail & Related papers (2021-04-13T03:46:59Z) - Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow [14.422129911404472]
Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox.
Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-26T11:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.