Steering LLM Reasoning Through Bias-Only Adaptation
- URL: http://arxiv.org/abs/2505.18706v1
- Date: Sat, 24 May 2025 13:55:38 GMT
- Title: Steering LLM Reasoning Through Bias-Only Adaptation
- Authors: Viacheslav Sinii, Alexey Gorbatovski, Artem Cherepanov, Boris Shaposhnikov, Nikita Balagansky, Daniil Gavrilov,
- Abstract summary: Reinforcement-learning finetuning does not create new capabilities but strengthens reasoning patterns already latent in the pretrained network.<n>We test this claim by training steering vectors: layer-wise biases that additively amplify selected hidden features.<n>Experiments on four base models across the GSM8K and MATH benchmarks show that steering vectors recover, and in several cases exceed, the accuracy of fully-tuned counterparts.
- Score: 4.486093197820339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work on reasoning-oriented language models, exemplified by o1-like systems, suggests that reinforcement-learning (RL) finetuning does not create new capabilities but instead strengthens reasoning patterns already latent in the pretrained network. We test this claim by training steering vectors: layer-wise biases that additively amplify selected hidden features while leaving all original weights unchanged. Experiments on four base models across the GSM8K and MATH benchmarks show that steering vectors recover, and in several cases exceed, the accuracy of fully-tuned counterparts. This result supports the view that the required reasoning skills pre-exist in the base model. Further, logit-lens analysis reveals that the trained vectors consistently boost token groups linked to structured languages and logical connectors, providing an interpretable account that aligns with the demands of quantitative reasoning tasks.
Related papers
- Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute [57.16286134405821]
We propose Fractional Reasoning, a framework that enables continuous control over reasoning intensity at inference time.<n>Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor.<n> Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.
arXiv Detail & Related papers (2025-06-18T21:15:59Z) - Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration [15.711365331854614]
We introduce Dynamic Adaptation of Reasoning Trajectories (DART), a novel data adaptation framework.<n>Instead of uniformly imitating expert steps, DART employs a selective imitation strategy guided by step-wise adaptability estimation.<n>We validate DART across multiple reasoning benchmarks and model scales, demonstrating that it significantly improves generalization and data efficiency.
arXiv Detail & Related papers (2025-05-27T04:08:11Z) - Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models [27.142703756752997]
We introduce MathIF, a benchmark for evaluating instruction-following in mathematical reasoning tasks.<n>Our empirical analysis reveals a consistent tension between scaling up reasoning capacity and maintaining controllability.<n>We show that even simple interventions can partially recover obedience, though at the cost of reasoning performance.
arXiv Detail & Related papers (2025-05-20T18:18:01Z) - Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? [67.30809748319486]
Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs)<n>This study critically examines the current state of RLVR.<n>We find that the current training setup does not elicit fundamentally new reasoning patterns.
arXiv Detail & Related papers (2025-04-18T17:59:56Z) - SEAL: Steerable Reasoning Calibration of Large Language Models for Free [58.190800043449336]
Large Language Models (LLMs) have demonstrated compelling capabilities for complex reasoning tasks via the extended chain-of-thought (CoT) reasoning mechanism.<n>Recent studies reveal substantial redundancy in the CoT reasoning traces, which negatively impacts model performance.<n>We introduce SEAL, a training-free approach that seamlessly calibrates the CoT process, improving accuracy while demonstrating significant efficiency gains.
arXiv Detail & Related papers (2025-04-07T02:42:07Z) - Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark designed to evaluate post-training methods for MLLMs in video understanding.<n>It includes intricate real-world videos and complex everyday planning tasks in the format of multiple-choice questions.<n>Using Qwen2-VL-Instruct-7B as a base model, we compare RL with supervised fine-tuning (SFT)<n>Our detailed analysis reveals that RL enhances visual perception but often produces less coherent reasoning chains.
arXiv Detail & Related papers (2025-03-31T17:55:23Z) - OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement [91.88062410741833]
This study investigates whether similar reasoning capabilities can be successfully integrated into large vision-language models (LVLMs)<n>We consider an approach that iteratively leverages supervised fine-tuning (SFT) on lightweight training data and Reinforcement Learning (RL) to further improve model generalization.<n>OpenVLThinker, a LVLM exhibiting consistently improved reasoning performance on challenging benchmarks such as MathVista, MathVerse, and MathVision, demonstrates the potential of our strategy for robust vision-language reasoning.
arXiv Detail & Related papers (2025-03-21T17:52:43Z) - The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.<n>UPFT removes the need for labeled data or exhaustive sampling.<n> Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z) - Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps [3.8936716676293917]
This study investigates the in-context learning capabilities of various decoder-only transformer-based language models with different model sizes and training data.<n>We identify a critical parameter threshold (1.6 billion), beyond which reasoning performance improves significantly in tasks such as commonsense reasoning in multiple-choice question answering and deductive reasoning.
arXiv Detail & Related papers (2025-02-21T00:48:32Z) - Improve Vision Language Model Chain-of-thought Reasoning [86.83335752119741]
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness.
We show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses.
arXiv Detail & Related papers (2024-10-21T17:00:06Z) - Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers [49.80959223722325]
We study the distinction between feed-forward and attention layers in large language models.<n>We find that feed-forward layers tend to learn simple distributional associations such as bigrams, while attention layers focus on in-context reasoning.
arXiv Detail & Related papers (2024-06-05T08:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.