Fugu-MT 論文翻訳(概要): Think When Needed: Model-Aware Reasoning Routing for LLM-based Ranking

論文の概要: Think When Needed: Model-Aware Reasoning Routing for LLM-based Ranking

arxiv url: http://arxiv.org/abs/2601.18146v1
Date: Mon, 26 Jan 2026 05:09:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-27 15:23:08.683033
Title: Think When Needed: Model-Aware Reasoning Routing for LLM-based Ranking
Title（参考訳）: 必要なときを考える: LLMに基づくランク付けのためのモデル認識推論ルーティング
Authors: Huizhong Guo, Tianjun Wei, Dongxia Wang, Yingpeng Du, Ziyan Wang, Jie Zhang, Zhu Sun,
Abstract要約: 推論プロンプトはランキングユーティリティを向上させることができるが、その利点は一貫性がなく、かなりの計算コストがかかる。本稿では, 直接推論 (Non-Think) と推論 (Think) を用いるかを決定するために, 軽量なプラグアンドプレイルータヘッドを用いた推論ルーティングフレームワークを提案する。
参考スコア（独自算出の注目度）: 25.69863022367215
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language models (LLMs) are increasingly applied to ranking tasks in retrieval and recommendation. Although reasoning prompting can enhance ranking utility, our preliminary exploration reveals that its benefits are inconsistent and come at a substantial computational cost, suggesting that when to reason is as crucial as how to reason. To address this issue, we propose a reasoning routing framework that employs a lightweight, plug-and-play router head to decide whether to use direct inference (Non-Think) or reasoning (Think) for each instance before generation. The router head relies solely on pre-generation signals: i) compact ranking-aware features (e.g., candidate dispersion) and ii) model-aware difficulty signals derived from a diagnostic checklist reflecting the model's estimated need for reasoning. By leveraging these features before generation, the router outputs a controllable token that determines whether to apply the Think mode. Furthermore, the router can adaptively select its operating policy along the validation Pareto frontier during deployment, enabling dynamic allocation of computational resources toward instances most likely to benefit from Think under varying system constraints. Experiments on three public ranking datasets with different scales of open-source LLMs show consistent improvements in ranking utility with reduced token consumption (e.g., +6.3\% NDCG@10 with -49.5\% tokens on MovieLens with Qwen3-4B), demonstrating reasoning routing as a practical solution to the accuracy-efficiency trade-off.
Abstract（参考訳）: 大規模言語モデル(LLM)は、検索とレコメンデーションにおけるタスクのランク付けにますます適用されている。推論を推し進めることは、ランキングユーティリティを向上させることができるが、予備的な調査により、その利点は一貫性がなく、相当な計算コストがかかることが判明し、理由付けは推論方法と同じくらい重要であることが示唆された。そこで本研究では, 直接推論(Non-Think) と推論(Think) を用いるかを決定するために, 軽量なプラグアンドプレイルータヘッドを用いた推論ルーティングフレームワークを提案する。ルーターヘッドは、前世代の信号のみに依存します。一コンパクトなランクアウェアの特徴(例えば、候補分散)及び二モデルの推論の必要性を反映した診断チェックリストから導出されるモデル認識困難信号生成前にこれらの機能を活用することで、ルータは制御可能なトークンを出力し、Thinkモードを適用するかどうかを決定する。さらに、ルータはデプロイメント中のParetoフロンティアの検証に沿って、運用ポリシーを適応的に選択できるため、さまざまなシステム制約の下でThinkの恩恵を受ける可能性が最も高いインスタンスに対して、計算リソースを動的に割り当てることが可能になる。オープンソースのLLMの規模が異なる3つの公開ランキングデータセットの実験では、トークン消費量を削減したランキングユーティリティ(例えば、 +6.3\% NDCG@10 と Qwen3-4B の MovieLens の -49.5\% トークン)が一貫した改善を示し、精度と効率のトレードオフに対する実用的な解決策としての推論ルーティングを実証している。

論文の概要: Think When Needed: Model-Aware Reasoning Routing for LLM-based Ranking

関連論文リスト