Fugu-MT 論文翻訳(概要): HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning

論文の概要: HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning

arxiv url: http://arxiv.org/abs/2511.09873v1
Date: Fri, 14 Nov 2025 01:14:36 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-14 22:53:22.542472
Title: HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning
Title（参考訳）: HierRouter:強化学習による大規模言語モデルの協調ルーティング
Authors: Nikunj Gupta, Bill Guo, Rajgopal Kannan, Viktor K. Prasanna,
Abstract要約: 大規模言語モデル(LLM)は多くのタスクにまたがって最先端のパフォーマンスを提供するが、高い計算とメモリコストを課す。特殊な軽量言語モデルのプールから推論パイプラインを動的に組み立てる階層的ルーティング手法であるHierを提案する。
参考スコア（独自算出の注目度）: 11.03159148013318
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Large Language Models (LLMs) deliver state-of-the-art performance across many tasks but impose high computational and memory costs, limiting their deployment in resource-constrained or real-time settings. To address this, we propose HierRouter, a hierarchical routing approach that dynamically assembles inference pipelines from a pool of specialized, lightweight language models. Formulated as a finite-horizon Markov Decision Process (MDP), our approach trains a Proximal Policy Optimization (PPO)-based reinforcement learning agent to iteratively select which models to invoke at each stage of multi-hop inference. The agent conditions on the evolving context and accumulated cost to make context-aware routing decisions. Experiments with three open-source candidate LLMs across six benchmarks, including QA, code generation, and mathematical reasoning, show that HierRouter improves response quality by up to 2.4x compared to using individual models independently, while incurring only a minimal additional inference cost on average. These results highlight the promise of hierarchical routing for cost-efficient, high-performance LLM inference. All codes can be found here https://github.com/ Nikunj-Gupta/hierouter.
Abstract（参考訳）: 大規模言語モデル(LLM)は多くのタスクにまたがって最先端のパフォーマンスを提供するが、高い計算とメモリコストを課し、リソース制約やリアルタイム設定でのデプロイメントを制限している。この問題に対処するため,HierRouterを提案する。HierRouterは,特殊な軽量言語モデルのプールから推論パイプラインを動的に組み立てる階層的ルーティング手法である。有限水平マルコフ決定過程 (MDP) として定式化され、我々はPPOに基づく強化学習エージェントを訓練し、マルチホップ推論の各段階でどのモデルを呼び出すかを反復的に選択する。エージェントは、進化するコンテキストと蓄積したコストに基づいて、コンテキスト対応のルーティング決定を行う。 QA、コード生成、数学的推論を含む6つのベンチマークにわたる3つのオープンソース候補LSMによる実験では、HierRouterは個々のモデルを独立して使用するよりもレスポンス品質を最大2.4倍改善する一方で、平均的な推論コストは最小限に抑えられている。これらの結果は、コスト効率、高性能なLLM推論のための階層的ルーティングの可能性を浮き彫りにした。すべてのコードはhttps://github.com/Nikunj-Gupta/hierouter.comで見ることができる。

論文の概要: HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning

関連論文リスト