Related papers: Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router

Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router

URL: http://arxiv.org/abs/2506.05901v1
Date: Fri, 06 Jun 2025 09:18:56 GMT
Title: Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router
Authors: Chenyang Shao, Xinyang Liu, Yutang Lin, Fengli Xu, Yong Li,
Abstract summary: Multi-step reasoning has proven essential for enhancing the problem-solving capabilities of Large Language Models.<n>Yet, many reasoning steps are relatively simple and can be handled by more efficient smaller-scale language models.<n>We propose R2-Reasoner, a novel framework that enables collaborative reasoning across heterogeneous LLMs.
Score: 9.580226379350737
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-step reasoning has proven essential for enhancing the problem-solving capabilities of Large Language Models (LLMs) by decomposing complex tasks into intermediate steps, either explicitly or implicitly. Extending the reasoning chain at test time through deeper thought processes or broader exploration, can furthur improve performance, but often incurs substantial costs due to the explosion in token usage. Yet, many reasoning steps are relatively simple and can be handled by more efficient smaller-scale language models (SLMs). This motivates hybrid approaches that allocate subtasks across models of varying capacities. However, realizing such collaboration requires accurate task decomposition and difficulty-aware subtask allocation, which is challenging. To address this, we propose R2-Reasoner, a novel framework that enables collaborative reasoning across heterogeneous LLMs by dynamically routing sub-tasks based on estimated complexity. At the core of our framework is a Reinforced Model Router, composed of a task decomposer and a subtask allocator. The task decomposer segments complex input queries into logically ordered subtasks, while the subtask allocator assigns each subtask to the most appropriate model, ranging from lightweight SLMs to powerful LLMs, balancing accuracy and efficiency. To train this router, we introduce a staged pipeline that combines supervised fine-tuning on task-specific datasets with Group Relative Policy Optimization algorithm, enabling self-supervised refinement through iterative reinforcement learning. Extensive experiments across four challenging benchmarks demonstrate that R2-Reasoner reduces API costs by 86.85% while maintaining or surpassing baseline accuracy. Our framework paves the way for more cost-effective and adaptive LLM reasoning. The code is open-source at https://anonymous.4open.science/r/R2_Reasoner .

Related papers

RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z)
RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow [19.502882116487005]
Remote sensing imagery presents vast, inherently unstructured spatial data.<n>We propose RemoteReasoner, a flexible and robust workflow for remote sensing reasoning tasks.<n>Preliminary experiments demonstrated that RemoteReasoner achieves remarkable performance across multi-granularity reasoning tasks.
arXiv Detail & Related papers (2025-07-25T13:58:11Z)
EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models [65.48902212293903]
We present the Extremely Complex Instruction Following Benchmark (EIFBENCH) for evaluating large language models (LLMs)<n>EIFBENCH includes multi-task scenarios that enable comprehensive assessment across diverse task types concurrently.<n>We also propose the Segment Policy Optimization (SegPO) algorithm to enhance the LLM's ability to accurately fulfill multi-task workflow.
arXiv Detail & Related papers (2025-06-10T02:39:55Z)
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning [52.32193550674408]
We aim to improve the reasoning capabilities of language models via reinforcement learning (RL)<n>We propose to schedule tasks from easy to hard (E2H), allowing LLMs to build reasoning skills gradually.<n>E2H Reasoner significantly improves the reasoning ability of small LLMs (1.5B to 3B)
arXiv Detail & Related papers (2025-06-07T02:41:54Z)
Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques [14.892995952768352]
Language Models (LMs) have excelled at tasks like text generation, summarization, and question answering.<n>Their inference remains computationally expensive and energy intensive in settings with limited hardware, power, or bandwidth.<n>Recent approaches have introduced multi LLM intelligent model selection strategies that dynamically allocate computational resources based on query complexity.
arXiv Detail & Related papers (2025-06-06T23:13:08Z)
LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models [21.888139819188105]
LLaVA-CMoE is a continual learning framework for large language models.<n> Probe-Guided Knowledge Extension mechanism determines when and where new experts should be added.<n>Probabilistic Task Locator assigns each task a dedicated, lightweight router.
arXiv Detail & Related papers (2025-03-27T07:36:11Z)
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging [17.038807261969033]
Long-to-Short (L2S) reasoning aims to balance reasoning depth with practical efficiency.<n>Model merging offers a cost-effective and robust alternative by integrating the quick-thinking capabilities of System 1 models with the methodical reasoning of System 2 models.<n>Our experiments reveal that model merging can reduce average response length by up to 55% while preserving or even improving baseline performance.
arXiv Detail & Related papers (2025-03-26T15:34:37Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents [5.566936703366701]
Division-of-Thoughts (DoT) is a collaborative reasoning framework leveraging the synergy between local and cloud-based language models.<n>DoT reduces the average reasoning time and API costs by 66.12% and 83.57%, while achieving comparable reasoning accuracy with the best baseline methods.
arXiv Detail & Related papers (2025-02-06T02:40:25Z)
Supervised Chain of Thought [5.389461633686935]
Chain of Thought (CoT) prompting offers a promising approach to solving complex reasoning tasks. One-prompt-for-all approach poses significant challenges for models to generate the correct reasoning steps. We show how task-specific supervision is essential for navigating the prompt space accurately and achieving optimal performance.
arXiv Detail & Related papers (2024-10-18T06:25:27Z)
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models [16.16372459671255]
Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget. We propose a novel framework that integrates smaller auxiliary modules within each Feed-Forward Network layer of the LLM. We show that trained routers operate differently from oracles and often yield suboptimal solutions.
arXiv Detail & Related papers (2024-10-01T16:10:21Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes. CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks. It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z)
Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks. This modular structure allows each prompt to be optimized for its specific sub-task. We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z)
Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.