Scheduling Your LLM Reinforcement Learning with Reasoning Trees
- URL: http://arxiv.org/abs/2510.24832v1
- Date: Tue, 28 Oct 2025 17:52:07 GMT
- Title: Scheduling Your LLM Reinforcement Learning with Reasoning Trees
- Authors: Hong Wang, Zhezheng Hao, Jian Luo, Chenxing Wei, Yao Shu, Lei Liu, Qiang Lin, Hande Dong, Jiawei Chen,
- Abstract summary: We introduce Reasoning Score (r-score), which measures the query's learning difficulty based on the structure of its reasoning tree.<n>Based on the r-score, we propose the Reasoning Tree Schedule (Re-Schedule), a scheduling algorithm that constructs a curriculum progressing from structurally simple (high r-score) to complex (low r-score) queries.
- Score: 18.720191133993715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using Reinforcement Learning with Verifiable Rewards (RLVR) to optimize Large Language Models (LLMs) can be conceptualized as progressively editing a query's `Reasoning Tree'. This process involves exploring nodes (tokens) and dynamically modifying the model's policy at each node. When combined with data scheduling, this process yields further gains in data efficiency and accuracy. However, existing RLVR data scheduling methods typically rely on path-based metrics to rank queries, overlooking the reasoning tree structures of these queries. In this paper, we introduce a novel metric, namely Reasoning Score (r-score), which measures the query's learning difficulty based on the structure of its reasoning tree. Based on the r-score, we propose the Reasoning Tree Schedule (Re-Schedule), a scheduling algorithm that constructs a curriculum progressing from structurally simple (high r-score) to complex (low r-score) queries. Experiments on six math-reasoning benchmarks show that Re-Schedule significantly improves average accuracy, achieving gains of up to 3.2%. These strong results validate our approach and demonstrate that a structural understanding of the reasoning tree provides a more powerful and principled foundation for RLVR data scheduling.
Related papers
- TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG [71.06073770344732]
Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval.<n>We present TreePS-RAG, an online, tree-based RL framework for agentic RAG that enables step-wise credit assignment while retaining outcome-only rewards.
arXiv Detail & Related papers (2026-01-11T14:07:30Z) - Is Implicit Knowledge Enough for LLMs? A RAG Approach for Tree-based Structures [0.5352699766206808]
Large Language Models (LLMs) are adept at generating responses based on information within their context.<n>Retrieval-Augmented Generation (RAG) retrieves relevant documents to augment the model's in-context learning.<n>We propose a novel method to linearize knowledge from tree-like structures by generating implicit, aggregated summaries at each hierarchical level.
arXiv Detail & Related papers (2025-10-12T20:52:43Z) - ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.<n>Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.<n>Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z) - SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models [88.29990536278167]
We introduce SPaR, a self-play framework integrating tree-search self-refinement to yield valid and comparable preference pairs.<n>Our experiments show that a LLaMA3-8B model, trained over three iterations guided by SPaR, surpasses GPT-4-Turbo on the IFEval benchmark without losing general capabilities.
arXiv Detail & Related papers (2024-12-16T09:47:43Z) - BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving [11.596474985695679]
We release the StructuredOR dataset, annotated with comprehensive labels that capture the complete mathematical modeling process.<n>We propose BPP-Search, an algorithm that integrates reinforcement learning into a tree-of-thought structure.
arXiv Detail & Related papers (2024-11-26T13:05:53Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules.
We use decision trees to convey this reasoning information, as they can be easily represented in natural language.
OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - RLET: A Reinforcement Learning Based Approach for Explainable QA with
Entailment Trees [47.745218107037786]
We propose RLET, a Reinforcement Learning based Entailment Tree generation framework.
RLET iteratively performs single step reasoning with sentence selection and deduction generation modules.
Experiments on three settings of the EntailmentBank dataset demonstrate the strength of using RL framework.
arXiv Detail & Related papers (2022-10-31T06:45:05Z) - Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner [56.08919422452905]
We propose an architecture called Iterative Retrieval-Generation Reasoner (IRGR)
Our model is able to explain a given hypothesis by systematically generating a step-by-step explanation from textual premises.
We outperform existing benchmarks on premise retrieval and entailment tree generation, with around 300% gain in overall correctness.
arXiv Detail & Related papers (2022-05-18T21:52:11Z) - The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data [33.26284196513858]
Learned indices have been proposed to replace classic index structures like B-Tree with machine learning (ML) models.
We propose a fundamentally different way of using ML techniques to improve on the query performance of the classic R-Tree without the need of changing its structure or query processing algorithms.
arXiv Detail & Related papers (2021-03-08T04:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.