Fugu-MT 論文翻訳(概要): Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

論文の概要: Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

arxiv url: http://arxiv.org/abs/2606.19376v1
Date: Fri, 12 Jun 2026 08:50:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-19 18:23:39.417214
Title: Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees
Title（参考訳）: ユーザ満足度保証の下でのユーザフィードバックを限定したコスト最適LCMルーティング
Authors: Herbert Woisetschläger, Arastun Mammadli, Ryan Zhang, Shiqiang Wang,
Abstract要約: SLAは、生産システムで利用可能な、まばらで一方的なユーザフィードバックからコスト最適化ポリシーを学習するオンラインルーティングアルゴリズムです。実験の結果、SLAはベンチマークごとのチューニングを必要とせずにSLAの制約を満たすことが示され、既存のベースラインよりも2.2倍のコストが削減された。
参考スコア（独自算出の注目度）: 11.389402303822635
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Inference costs for large language model (LLM) applications are rapidly growing, driven by surging demand and rising infrastructure cost. Users expect high-quality responses, and in commercial settings this is formally codified in Service Level Agreements (SLAs), creating a fundamental tension between cost and quality. Recent progress on cost-aware LLM request routing has shown potential to resolve this tension, but existing approaches rely on complete feedback signals, offline training, extensive per-workload tuning, and most lack SLA guarantees or inference-time adaptivity. We introduce SLARouter, an online routing algorithm that learns a cost-optimal policy from the sparse, one-sided user feedback available in production systems. SLARouter provides theoretical guarantees for both cost optimality and strict SLA compliance. Experiments across a wide range of LLM benchmarks show that SLARouter satisfies SLA constraints without the need for per-benchmark tuning, reducing operating cost by up to 2.2x over existing baselines.
Abstract（参考訳）: 大規模言語モデル(LLM)アプリケーションの推論コストは、需要の急増とインフラコストの上昇によって急速に増加しています。ユーザは高品質なレスポンスを期待しており、商業的な設定では、これを正式にSLA(Service Level Agreements)として定式化することで、コストと品質の基本的な緊張関係を生み出します。コストを意識したLCM要求ルーティングの最近の進歩は、この緊張を解消する可能性を示しているが、既存のアプローチでは、完全なフィードバック信号、オフライントレーニング、広範囲なワークロードチューニング、SLA保証や推論時適応性に頼っている。我々はSLARouterを紹介した。SLARouterは、プロダクションシステムで利用可能な、まばらで一方的なユーザーフィードバックからコスト最適化ポリシーを学習するオンラインルーティングアルゴリズムである。 SLARouterは、コスト最適性と厳格なSLAコンプライアンスの両方に関する理論的保証を提供する。 LLMベンチマークでの実験では、SLARouterはベンチマークごとのチューニングを必要とせずにSLAの制約を満足し、既存のベースラインよりも最大2.2倍の運用コストを削減している。

論文の概要: Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

関連論文リスト