Related papers: Decomposing Elements of Problem Solving: What "Math" Does RL Teach?

Decomposing Elements of Problem Solving: What "Math" Does RL Teach?

URL: http://arxiv.org/abs/2505.22756v1
Date: Wed, 28 May 2025 18:18:49 GMT
Title: Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Authors: Tian Qin, Core Francisco Park, Mujin Kwun, Aaron Walsman, Eran Malach, Nikhil Anand, Hidenori Tanaka, David Alvarez-Melis,
Abstract summary: We decompose problem solving into fundamental capabilities: Plan, Execute, and Verify.<n>We show that RL-trained models struggle with fundamentally new problems, hitting a 'coverage wall' due to insufficient planning skills.<n>Our findings provide insights into the role of RL in enhancing LLM reasoning, expose key limitations, and suggest a path toward overcoming these barriers.
Score: 22.517954679764244
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Mathematical reasoning tasks have become prominent benchmarks for assessing the reasoning capabilities of LLMs, especially with reinforcement learning (RL) methods such as GRPO showing significant performance gains. However, accuracy metrics alone do not support fine-grained assessment of capabilities and fail to reveal which problem-solving skills have been internalized. To better understand these capabilities, we propose to decompose problem solving into fundamental capabilities: Plan (mapping questions to sequences of steps), Execute (correctly performing solution steps), and Verify (identifying the correctness of a solution). Empirically, we find that GRPO mainly enhances the execution skill-improving execution robustness on problems the model already knows how to solve-a phenomenon we call temperature distillation. More importantly, we show that RL-trained models struggle with fundamentally new problems, hitting a 'coverage wall' due to insufficient planning skills. To explore RL's impact more deeply, we construct a minimal, synthetic solution-tree navigation task as an analogy for mathematical problem-solving. This controlled setup replicates our empirical findings, confirming RL primarily boosts execution robustness. Importantly, in this setting, we identify conditions under which RL can potentially overcome the coverage wall through improved exploration and generalization to new solution paths. Our findings provide insights into the role of RL in enhancing LLM reasoning, expose key limitations, and suggest a path toward overcoming these barriers. Code is available at https://github.com/cfpark00/RL-Wall.

Related papers

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [82.43575191712726]
We introduce a fine-grained analytic framework to dissect the impact ofReinforcement learning on reasoning.<n>Our framework specifically investigates key elements that have been hypothesized to benefit from RL training.
arXiv Detail & Related papers (2025-06-05T07:53:59Z)
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem [53.3188041952701]
We show that Critique Fine-Tuning (CFT) on only one problem can effectively unleash the reasoning potential of LLMs.<n>With just 5 GPU hours of training, Qwen-Math-7B-CFT show an average improvement of 15% on six math benchmarks and 16% on three logic reasoning benchmarks.<n>Results are comparable to or even surpass the results from RL with 20x less compute.
arXiv Detail & Related papers (2025-06-03T18:35:52Z)
Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs [12.087316618902433]
Reasoning large language models (LLMs) excel in complex tasks.<n>Existing approaches allocate an equal number of rollouts to all questions during reinforcement learning (RL)<n>We propose a mechanism for dynamically allocating rollout budgets based on the difficulty of the problems.
arXiv Detail & Related papers (2025-05-24T07:28:29Z)
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning [50.02117478165099]
We show that large-scale reinforcement learning can significantly enhance the reasoning capabilities of strong, small- and mid-sized models.<n>We propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts.
arXiv Detail & Related papers (2025-05-22T08:50:47Z)
Improving RL Exploration for LLM Reasoning through Retrospective Replay [45.00643118030677]
We propose a novel algorithm named Retrospective Replay-based Reinforcement Learning (RRL), which introduces a dynamic replay mechanism throughout the training process.<n>RRL enables the model to revisit promising states identified in the early stages, thereby improving its efficiency and effectiveness in exploration.
arXiv Detail & Related papers (2025-04-19T17:40:04Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs) We propose a new RL method named RLMEC that incorporates a generative model as the reward model. Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z)
A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.