Fugu-MT 論文翻訳(概要): RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

論文の概要: RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

arxiv url: http://arxiv.org/abs/2509.25426v2
Date: Wed, 01 Oct 2025 00:34:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-02 12:11:26.800587
Title: RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs
Title（参考訳）: RADAR:LLMの推論能力と難易度を考慮したルーティング
Authors: Nigel Fernandez, Branislav Kveton, Ryan A. Rossi, Andrew S. Lan, Zichao Wang,
Abstract要約: 本稿では、軽量で解釈可能でスケーラブルなルーティングフレームワークRADAR(Reasoning-Ability and Difficulty-Aware Routing)を提案する。心理測定にインスパイアされたRADARは、さまざまな予算を持つモデル応答から異なるクエリへのアイテム応答モデルを学ぶ。我々は8つの広く使われている推論ベンチマークについて広範な実験を行い、最先端のルーティング手法と比較してRADARの優れた性能を実証した。
参考スコア（独自算出の注目度）: 51.88834210085435
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reasoning language models have demonstrated remarkable performance on many challenging tasks in math, science, and coding. Choosing the right reasoning model for practical deployment involves a performance and cost tradeoff at two key levels: model size and reasoning budget, where larger models and higher reasoning budget lead to better performance but with increased cost and latency. In this work, we tackle this tradeoff from the angle of model configuration routing for different queries, and present RADAR (Reasoning-Ability and Difficulty-Aware Routing), a lightweight, interpretable, and scalable routing framework. Inspired by psychometrics, RADAR learns an item response model from model responses with different budgets to different queries, with interpretable parameters including query difficulties and model-budget abilities. RADAR then routes queries with higher difficulty to model-budget pairs with higher ability, and vice versa. We conduct extensive experiments on 8 widely used challenging reasoning benchmarks, demonstrating the superior performance of RADAR compared to state-of-the-art model routing methods. RADAR also exhibits query generalization capabilities, showing strong performance on out-of-distribution queries in all benchmarks. RADAR is also scalable and can efficiently integrate additional models by dynamically selecting a small set of evaluation queries to estimate their abilities.
Abstract（参考訳）: 推論言語モデルは、数学、科学、コーディングにおける多くの困難なタスクにおいて顕著なパフォーマンスを示してきた。モデルのサイズと推論予算 – 大きなモデルと高い推論予算がパフォーマンスの向上につながるが、コストとレイテンシの増大につながる。本研究では、異なるクエリに対するモデル構成ルーティングの角度と、軽量で解釈可能でスケーラブルなルーティングフレームワークであるRADAR(Reasoning-Ability and Difficulty-Aware Routing)とのトレードオフに取り組む。心理測定にインスパイアされたRADARは、さまざまな予算を持つモデル応答から、クエリ障害やモデル予算能力を含む解釈可能なパラメータを含む、さまざまなクエリへのアイテム応答モデルを学ぶ。その後、RADARは、より難しいクエリを高機能なモデル予算ペアにルーティングする。提案手法は,現状のモデルルーティング手法と比較して,RADARの優れた性能を示すため,広く用いられている8つの挑戦的推論ベンチマークに対して広範囲に実験を行った。 RADARはクエリの一般化機能も備えており、すべてのベンチマークにおけるアウト・オブ・ディストリビューションクエリのパフォーマンスが向上している。 RADARはまたスケーラブルで、少数の評価クエリを動的に選択し、その能力を見積もることで、追加モデルを効率的に統合することができる。

論文の概要: RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

関連論文リスト