Fugu-MT 論文翻訳(概要): Reward-Based Online LLM Routing via NeuralUCB

論文の概要: Reward-Based Online LLM Routing via NeuralUCB

arxiv url: http://arxiv.org/abs/2603.30035v1
Date: Tue, 31 Mar 2026 17:35:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-01 15:25:03.95802
Title: Reward-Based Online LLM Routing via NeuralUCB
Title（参考訳）: NeuralUCBによるリワード型オンラインLCMルーティング
Authors: Ming-Hua Tsai, Phat Tran,
Abstract要約: 我々は、NeuralUCBベースのルーティングポリシーを実装し、RouterBenchで評価する。実験結果から,提案手法は実用性報酬において,ランダムなベースラインと最小コストのベースラインを一貫して上回ることがわかった。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study investigates the use of NeuralUCB for cost-aware large language model (LLM) routing. Existing routing approaches can be broadly grouped into supervised routing methods and partial-feedback methods, each with different tradeoffs in efficiency and adaptivity. We implement a NeuralUCB-based routing policy and evaluate it on RouterBench under a simulated online setting. Experimental results show that the proposed method consistently outperforms random and min-cost baselines in utility reward. Compared with the max-quality reference, our method achieves substantially lower inference cost while maintaining competitive reward. These findings suggest that NeuralUCB is a promising approach for cost-aware LLM routing, while also highlighting remaining challenges in action discrimination and exploration.
Abstract（参考訳）: 本研究では,コスト認識型大規模言語モデル(LLM)ルーティングにおけるNeuralUCBの使用について検討する。既存のルーティングアプローチは、教師付きルーティング手法と部分フィードバック手法に広くグループ化することができ、それぞれが効率と適応性の異なるトレードオフを持つ。我々は、NeuralUCBベースのルーティングポリシーを実装し、LouterBench上でシミュレートされたオンライン設定で評価する。実験結果から,提案手法は実用性報酬において,ランダムなベースラインと最小コストのベースラインを一貫して上回ることがわかった。最大品質基準と比較すると,提案手法は競争報酬を維持しつつ,推論コストを大幅に低減する。以上の結果から,NeuralUCBはLCMルーティングに有望なアプローチであり,行動判別と探索の課題も強調した。

論文の概要: Reward-Based Online LLM Routing via NeuralUCB

関連論文リスト