Related papers: RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

URL: http://arxiv.org/abs/2509.25426v2
Date: Wed, 01 Oct 2025 00:34:10 GMT
Title: RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs
Authors: Nigel Fernandez, Branislav Kveton, Ryan A. Rossi, Andrew S. Lan, Zichao Wang,
Abstract summary: We present RADAR (Reasoning-Ability and Difficulty-Aware Routing), a lightweight, interpretable, and scalable routing framework.<n>Inspired by psychometrics, RADAR learns an item response model from model responses with different budgets to different queries.<n>We conduct extensive experiments on 8 widely used reasoning benchmarks, demonstrating the superior performance of RADAR compared to state-of-the-art routing methods.
Score: 51.88834210085435
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reasoning language models have demonstrated remarkable performance on many challenging tasks in math, science, and coding. Choosing the right reasoning model for practical deployment involves a performance and cost tradeoff at two key levels: model size and reasoning budget, where larger models and higher reasoning budget lead to better performance but with increased cost and latency. In this work, we tackle this tradeoff from the angle of model configuration routing for different queries, and present RADAR (Reasoning-Ability and Difficulty-Aware Routing), a lightweight, interpretable, and scalable routing framework. Inspired by psychometrics, RADAR learns an item response model from model responses with different budgets to different queries, with interpretable parameters including query difficulties and model-budget abilities. RADAR then routes queries with higher difficulty to model-budget pairs with higher ability, and vice versa. We conduct extensive experiments on 8 widely used challenging reasoning benchmarks, demonstrating the superior performance of RADAR compared to state-of-the-art model routing methods. RADAR also exhibits query generalization capabilities, showing strong performance on out-of-distribution queries in all benchmarks. RADAR is also scalable and can efficiently integrate additional models by dynamically selecting a small set of evaluation queries to estimate their abilities.

Related papers

Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts [56.02203242609604]
Large language models (LLMs) fine-tuned with lightweight adapters achieve strong performance across diverse tasks.<n>Fusing independently trained models with different strengths has shown promise for multi-task learning through three main strategies.<n>We empirically evaluate their trade-offs, addressing two key questions: What are the advantages of going beyond uniform ensembling or merging? and does the flexibility of routing justify its complexity?
arXiv Detail & Related papers (2026-03-03T21:44:11Z)
Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning [28.165465162107253]
We propose SCOPE, a routing framework that goes beyond model selection by predicting their cost and performance.<n>SCOPE makes reasoning-based predictions by retrieving how models behave on similar problems, rather than relying on fixed model names.<n>It can boost accuracy by up to 25.7% when performance is the priority, or cut costs by up to 95.1% when efficiency matters most.
arXiv Detail & Related papers (2026-01-29T21:09:36Z)
RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents [91.0187958746262]
RouteMoA is an efficient mixture-of-agents framework with dynamic routing.<n>It employs a lightweight scorer to perform initial screening by predicting coarse-grained performance from the query.<n>It refines these scores through lightweight self- and cross-assessment based on existing model outputs, providing posterior correction without additional inference.
arXiv Detail & Related papers (2026-01-26T04:22:22Z)
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning [103.7657839292775]
ARM-Thinker is an Agentic multimodal Reward Model that autonomously invokes external tools to ground judgments in verifiable evidence.<n>We train ARM-Thinker with multi-stage reinforcement learning, jointly optimizing tool-calling decisions and judgment accuracy.<n>Our results demonstrate that agentic capabilities significantly enhance both accuracy and interpretability of reward models.
arXiv Detail & Related papers (2025-12-04T18:59:52Z)
Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning [20.41220110321494]
We propose Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning.<n>STEER is a domain-agnostic framework that performs fine-grained, step-level routing between smaller and larger language models.<n>Our results establish model-internal confidence as a robust, domain-agnostic signal for model routing.
arXiv Detail & Related papers (2025-11-09T02:33:08Z)
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning [104.63494870852894]
We present x, a tool-calling-based routing system in which a learned router can either answer directly or invoke one or more external models.<n>Our implementation encompasses the full reinforcement learning framework, including reward and cost accounting.<n>Across diverse benchmarks, x achieves strong cost-performance trade-offs.
arXiv Detail & Related papers (2025-10-09T16:52:01Z)
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation [62.14510717860079]
We propose a Synergistic Diffusion-Autoregression paradigm that unifies the training efficiency of autoregressive models with the parallel inference capability of diffusion.<n>SDAR performs a lightweight paradigm conversion that transforms a well-trained autoregressive (AR) model into a blockwise diffusion model through brief, data-efficient adaptation.<n>Building on this insight, SDAR achieves efficient AR-to-diffusion conversion with minimal cost, preserving AR-level performance while enabling parallel generation.
arXiv Detail & Related papers (2025-10-07T17:29:28Z)
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding [59.60915947702282]
Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in enhancing the reasoning capabilities of large language models (LLMs)<n>Existing RLVR methods often suffer from exploration inefficiency due to mismatches between the training data's difficulty and the model's capability.<n>We propose SEELE, a novel supervision-aided RLVR framework that dynamically adjusts problem difficulty to stay within the high-efficiency region.
arXiv Detail & Related papers (2025-09-08T17:36:21Z)
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following [37.69688837528397]
Reasoning models excel in complex problem solving but exhibit a concerning trade off between reasoning capabilities and instruction following abilities.<n>We propose a self-supervised RL framework that leverages reasoning models' own internal signals to improve instruction following capabilities without external supervision.
arXiv Detail & Related papers (2025-08-04T07:48:59Z)
Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection [7.045509749924679]
Route-To-Reason (RTR) is a novel unified routing framework that dynamically allocates both LMs and reasoning strategies according to task difficulty under budget constraints.<n>RTR learns compressed representations of both expert models and reasoning strategies, enabling their joint and adaptive selection at inference time.
arXiv Detail & Related papers (2025-05-26T02:53:17Z)
Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z)
DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution [28.589498108609202]
Low-Rank Adaptation (LoRA) relies on a bypass framework that ignores the differential parameter budget requirements across weight matrices. DoRA decomposes high-rank LoRA layers into structured single-rank components, allowing for dynamic pruning of parameter budget. Experimental results demonstrate that DoRA can achieve competitive performance compared with LoRA and full model fine-tuning.
arXiv Detail & Related papers (2024-05-27T17:02:27Z)
Deep Reinforcement Learning from Hierarchical Preference Design [99.46415116087259]
This paper shows by exploiting certain structures, one can ease the reward design process. We propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning.
arXiv Detail & Related papers (2023-09-06T00:44:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.