What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis
- URL: http://arxiv.org/abs/2602.12395v1
- Date: Thu, 12 Feb 2026 20:44:27 GMT
- Title: What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis
- Authors: Xirui Li, Ming Li, Tianyi Zhou,
- Abstract summary: Reinforcement learning with computation rewards has become a standard post-training stage for boosting visual reasoning in vision-language models.<n>End-to-end benchmark gains conflate multiple factors, making it difficult to attribute improvements to specific skills.<n>We propose a Frankenstein-style analysis framework including: (i) functional localization via causal probing; (ii) update characterization via parameter comparison; and (iii) transferability test via model merging.
- Score: 23.904569857346605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) with verifiable rewards has become a standard post-training stage for boosting visual reasoning in vision-language models, yet it remains unclear what capabilities RL actually improves compared with supervised fine-tuning as cold-start initialization (IN). End-to-end benchmark gains conflate multiple factors, making it difficult to attribute improvements to specific skills. To bridge the gap, we propose a Frankenstein-style analysis framework including: (i) functional localization via causal probing; (ii) update characterization via parameter comparison; and (iii) transferability test via model merging. Instead, RL induces a consistent inference-time shift primarily in mid-to-late layers, and these mid-to-late refinements are both transferable (via merging) and necessary (via freezing) for RL gains. Overall, our results suggest that RL's reliable contribution in visual reasoning is not a uniform enhancement of visual perception, but a systematic refinement of mid-to-late transformer computation that improves vision-to-reasoning alignment and reasoning performance, highlighting the limitations of benchmark-only evaluation for understanding multimodal reasoning improvements.
Related papers
- Learning Dynamics in RL Post-Training for Language Models [2.538209532048867]
We analyze the learning dynamics of RL post-training from a perspective that has been studied in supervised learning but remains underexplored in RL.<n>We show that limited variability in feature representations can cause RL updates to systematically increase model confidence.<n>Motivated by these insights, we propose classifier-first reinforcement learning (CF-RL), a simple two-stage training strategy.
arXiv Detail & Related papers (2026-01-08T07:32:15Z) - Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models [71.9060068259379]
We propose cascaded domain-wise reinforcement learning to build general-purpose reasoning models.<n>Our 14B model, after RL, outperforms its SFT teacher, DeepSeek-R1-0528, on LiveCodeBench v5/v6 Pro and silver-medal performance in the 2025 International Olympiad in Informatics (IOI)
arXiv Detail & Related papers (2025-12-15T18:02:35Z) - Coupled Variational Reinforcement Learning for Language Model General Reasoning [83.82392089177841]
We propose textitbCoupled bVari bReinforcement bLearning (CoVRL) to bridge variational inference and reinforcement learning.<n>CoVRL improves performance by 12.4% over the base model and achieves an additional 2.3% improvement over strong state-of-the-art verifier-free RL baselines.
arXiv Detail & Related papers (2025-12-14T07:03:51Z) - Tailored Primitive Initialization is the Secret Key to Reinforcement Learning [61.29280885291581]
Reinforcement learning (RL) has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs)<n>We argue that initializing LLMs with diverse, high-quality reasoning primitives is essential for achieving stable and sample-efficient RL training.<n>We propose Tailor, a finetuning pipeline that automatically discovers and curates novel reasoning primitives.
arXiv Detail & Related papers (2025-11-16T03:12:40Z) - ConfClip: Confidence-Weighted and Clipped Reward for Reinforcement Learning in LLMs [32.13266235550995]
Reinforcement learning (RL) has become a standard paradigm for refining large language models (LLMs)<n>Inspired by observations from human learning, we introduce a RL technique that integrates verifiable outcomes with the model's own confidence estimates.
arXiv Detail & Related papers (2025-09-22T13:00:35Z) - Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [93.00629872970364]
Reinforcement learning (RL) has become the dominant paradigm for improving the performance of language models on complex reasoning tasks.<n>We introduce SPARKLE, a fine-grained analytic framework to dissect the effects of RL across three key dimensions.<n>We study whether difficult problems -- those yielding no RL signals and mixed-quality reasoning traces -- can still be effectively used for training.
arXiv Detail & Related papers (2025-06-05T07:53:59Z) - Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models [83.24079543652253]
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization.<n>However, reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations.<n>We propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification.
arXiv Detail & Related papers (2025-05-30T14:23:32Z) - Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark designed to evaluate post-training methods for MLLMs in video understanding.<n>It includes intricate real-world videos and complex everyday planning tasks in the format of multiple-choice questions.<n>Using Qwen2-VL-Instruct-7B as a base model, we compare RL with supervised fine-tuning (SFT)<n>Our detailed analysis reveals that RL enhances visual perception but often produces less coherent reasoning chains.
arXiv Detail & Related papers (2025-03-31T17:55:23Z) - OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles [91.88062410741833]
We introduce OpenVLThinker, one of the first open-source large vision-language models (LVLMs) to exhibit sophisticated chain-of-thought reasoning.<n>We show that OpenVLThinker-7B consistently advances performance across six benchmarks demanding mathematical and general reasoning.
arXiv Detail & Related papers (2025-03-21T17:52:43Z) - Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning [26.835266813794316]
We first propose CLS-RL for MLLM image classification, using verifiable rewards for fine-tuning.<n>We then rethink and question whether explicit thinking in RFT is always necessary.<n>No-Thinking-RL explores RFT without thinking by introducing a simple equality accuracy reward.
arXiv Detail & Related papers (2025-03-20T14:37:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.