Fugu-MT 論文翻訳(概要): IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs

論文の概要: IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs

arxiv url: http://arxiv.org/abs/2509.06274v1
Date: Mon, 08 Sep 2025 01:46:27 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-09 14:07:03.93339
Title: IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs
Title（参考訳）: IPR: ユーザ構成品質-コストトレードオフによるインテリジェントプロンプトルーティング
Authors: Aosong Feng, Zhichao Xu, Xian Wu, Kang Zhou, Sheng Guan, Yueyan Chen, Ninad Kulkarni, Yun Zhou, Balasubramaniam Srinivasan, Haibo Ding, Lin Lee Cheong,
Abstract要約: Intelligent Prompt Routingフレームワークは、予測応答品質とユーザ指定許容レベルに基づいて最適なモデルを動的に選択する。 IPRは43.9%のコスト削減を実現し、クロード家の最強モデルに匹敵する品質を維持している。 IPRは主要なクラウドプラットフォームにデプロイされ、150ms以下のレイテンシでリクエストを処理する。
参考スコア（独自算出の注目度）: 16.941643717839728
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Routing incoming queries to the most cost-effective LLM while maintaining response quality poses a fundamental challenge in optimizing performance-cost trade-offs for large-scale commercial systems. We present IPR\, a quality-constrained Intelligent Prompt Routing framework that dynamically selects optimal models based on predicted response quality and user-specified tolerance levels. IPR introduces three key innovations: (1) a modular architecture with lightweight quality estimators trained on 1.5M prompts annotated with calibrated quality scores, enabling fine-grained quality prediction across model families; (2) a user-controlled routing mechanism with tolerance parameter $\tau \in [0,1]$ that provides explicit control over quality-cost trade-offs; and (3) an extensible design using frozen encoders with model-specific adapters, reducing new model integration from days to hours. To rigorously train and evaluate IPR, we curate an industrial-level dataset IPRBench\footnote{IPRBench will be released upon legal approval.}, a comprehensive benchmark containing 1.5 million examples with response quality annotations across 11 LLM candidates. Deployed on a major cloud platform, IPR achieves 43.9\% cost reduction while maintaining quality parity with the strongest model in the Claude family and processes requests with sub-150ms latency.
Abstract（参考訳）: 応答品質を維持しながら、最もコスト効率の高いLCMにクエリをルーティングすることは、大規模商用システムのパフォーマンスコストトレードオフを最適化する上で、根本的な課題となる。我々は,予測応答品質とユーザ指定許容レベルに基づいて最適モデルを動的に選択する,品質制約のあるIntelligent Prompt RoutingフレームワークであるIPR\を提案する。 IPRは、(1)1.5Mでトレーニングされた軽量な品質推定器を備えたモジュラーアーキテクチャは、キャリブレーションされた品質スコアでアノテートし、モデルファミリ間できめ細かな品質予測を可能にする。(2)許容パラメータを持つユーザ制御ルーティング機構$\tau \in [0,1]$、(3)モデル固有のアダプタを備えたフリーズエンコーダを使用した拡張可能な設計で、数日から数時間に短縮する。 IPRを厳格に訓練し評価するために、産業レベルのデータセットである IPRBench\footnote{IPRBench を法的承認を得てリリースする。 11のLLM候補に対して、応答品質アノテーションを備えた15万のサンプルを含む包括的なベンチマークである。主要なクラウドプラットフォーム上にデプロイされたIPRは、クロードファミリーで最強のモデルと150ms以下のレイテンシでリクエストを処理する品質を保ちながら、43.9%のコスト削減を実現している。

論文の概要: IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs

関連論文リスト