Fugu-MT 論文翻訳(概要): IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs

論文の概要: IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs

arxiv url: http://arxiv.org/abs/2509.06274v4
Date: Thu, 09 Oct 2025 05:51:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 12:56:53.510899
Title: IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs
Title（参考訳）: IPR: ユーザ構成品質-コストトレードオフによるインテリジェントプロンプトルーティング
Authors: Aosong Feng, Balasubramaniam Srinivasan, Yun Zhou, Zhichao Xu, Kang Zhou, Sheng Guan, Yueyan Chen, Xian Wu, Ninad Kulkarni, Yi Zhang, Zhengyuan Shen, Dmitriy Bespalov, Soumya Smruti Mishra, Yifei Teng, Darren Yow-Bang Wang, Haibo Ding, Lin Lee Cheong,
Abstract要約: textbfIngent textbfPrompt textbfRouting frameworkは、予測応答品質とユーザ指定許容レベルに基づいて最適なモデルを動的に選択する。 IPRは43.9%のコスト削減を実現し、クロード家の最強モデルに匹敵する品質を維持している。
参考スコア（独自算出の注目度）: 19.658944117970137
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Routing incoming queries to the most cost-effective LLM while maintaining response quality poses a fundamental challenge in optimizing performance-cost trade-offs for large-scale commercial systems. We present IPR\, -- \,a quality-constrained \textbf{I}ntelligent \textbf{P}rompt \textbf{R}outing framework that dynamically selects optimal models based on predicted response quality and user-specified tolerance levels. IPR introduces three key innovations: (1) a modular architecture with lightweight quality estimators trained on 1.5M prompts annotated with calibrated quality scores, enabling fine-grained quality prediction across model families; (2) a user-controlled routing mechanism with tolerance parameter $\tau \in [0,1]$ that provides explicit control over quality-cost trade-offs; and (3) an extensible design using frozen encoders with model-specific adapters, reducing new model integration from days to hours. To rigorously train and evaluate IPR, we curate an industrial-level dataset IPRBench\footnote{IPRBench will be released upon legal approval.}, a comprehensive benchmark containing 1.5 million examples with response quality annotations across 11 LLM candidates. Deployed on a major cloud platform, IPR achieves 43.9\% cost reduction while maintaining quality parity with the strongest model in the Claude family and processes requests with sub-150ms latency. The deployed system and additional product details are publicly available at https://aws.amazon.com/bedrock/intelligent-prompt-routing/
Abstract（参考訳）: 応答品質を維持しながら、最もコスト効率の高いLCMにクエリをルーティングすることは、大規模商用システムのパフォーマンスコストトレードオフを最適化する上で、根本的な課題となる。 IPR\, -- \, a quality-constrained \textbf{I}ntelligent \textbf{P}rompt \textbf{R}outing framework。 IPRは、(1)1.5Mでトレーニングされた軽量な品質推定器を備えたモジュラーアーキテクチャは、キャリブレーションされた品質スコアでアノテートし、モデルファミリ間できめ細かな品質予測を可能にする。(2)許容パラメータを持つユーザ制御ルーティング機構$\tau \in [0,1]$、(3)モデル固有のアダプタを備えたフリーズエンコーダを使用した拡張可能な設計で、数日から数時間に短縮する。 IPRを厳格に訓練し評価するために、産業レベルのデータセットである IPRBench\footnote{IPRBench を法的承認を得てリリースする。 11のLLM候補に対して、応答品質アノテーションを備えた15万のサンプルを含む包括的なベンチマークである。主要なクラウドプラットフォーム上にデプロイされたIPRは、クロードファミリーで最強のモデルと150ms以下のレイテンシでリクエストを処理する品質を保ちながら、43.9%のコスト削減を実現している。デプロイされたシステムと製品の詳細はhttps://aws.amazon.com/bedrock/intelligent-prompt-routing/で公開されている。

論文の概要: IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs

関連論文リスト