Fugu-MT 論文翻訳(概要): vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

論文の概要: vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

arxiv url: http://arxiv.org/abs/2603.04444v1
Date: Mon, 23 Feb 2026 15:00:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:08.220169
Title: vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models
Title（参考訳）: vLLMセマンティックルータ:混合モードモデルのための信号駆動決定ルーティング
Authors: Xunzhuo Liu, Huamin Chen, Samzong Lu, Yossi Ovadia, Guohong Wen, Zhengda Tan, Jintao Zhang, Senan Zedan, Yehudit Kerido, Liav Weiss, Bishen Yu, Asaad Balum, Noa Limoy, Abdallah Samara, Brent Salisbury, Hao Wu, Ryan Cook, Zhijie Wang, Qiping Pan, Rehan Khan, Avishek Goswami, Houston H. Zhang, Shuyi Wang, Ziang Tang, Fang Han, Zohaib Hassan, Jianqiao Zheng, Avinash Changrani,
Abstract要約: vLLM Semantic Routerは、Mixture-of-Modality(MoM)モデルデプロイメントのための信号駆動決定ルーティングフレームワークである。システムは、各要求から異種信号タイプを抽出する。異なるデプロイメントシナリオは、同じアーキテクチャ上で異なるシグナル決定構成として表現される。
参考スコア（独自算出の注目度）: 8.433829083279518
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing -- selecting the right model for each query at inference time -- has become a critical systems challenge. We present vLLM Semantic Router, a signal-driven decision routing framework for Mixture-of-Modality (MoM) model deployments. The central innovation is composable signal orchestration: the system extracts heterogeneous signal types from each request -- from sub-millisecond heuristic features (keyword patterns, language detection, context length, role-based authorization) to neural classifiers (domain, embedding similarity, factual grounding, modality) -- and composes them through configurable Boolean decision rules into deployment-specific routing policies. Different deployment scenarios -- multi-cloud enterprise, privacy-regulated, cost-optimized, latency-sensitive -- are expressed as different signal-decision configurations over the same architecture, without code changes. Matched decisions drive semantic model routing: over a dozen of selection algorithms analyze request characteristics to find the best model cost-effectively, while per-decision plugin chains enforce privacy and safety constraints (jailbreak detection, PII filtering, hallucination detection via the three-stage HaluGate pipeline). The system provides OpenAI API support for stateful multi-turn conversations, multi-endpoint and multi-provider routing across heterogeneous backends (vLLM, OpenAI, Anthropic, Azure, Bedrock, Gemini, Vertex AI), and a pluggable authorization factory supporting multiple auth providers. Deployed in production as an Envoy external processor, the architecture demonstrates that composable signal orchestration enables a single routing framework to serve diverse deployment scenarios with differentiated cost, privacy, and safety policies.
Abstract（参考訳）: 大規模言語モデル(LLM)がモダリティ、機能、コストプロファイルを多様化するにつれ、インテリジェントなリクエストルーティング -- 推論時にクエリ毎に適切なモデルを選択する -- という問題は、重要なシステム課題になっています。本稿では,Mixture-of-Modality(MoM)モデルデプロイメントのための信号駆動決定ルーティングフレームワークであるセマンティックルータについて述べる。システムは、サブミリ秒以下のヒューリスティックな特徴(キーワードパターン、言語検出、コンテキスト長、ロールベースの承認)からニューラル分類器(ドメイン、埋め込み類似性、事実的根拠、モダリティ)まで、各リクエストから異種信号タイプを抽出し、設定可能なブール決定ルールをデプロイメント固有のルーティングポリシーに組み込む。異なるデプロイメントシナリオ -- マルチクラウドエンタープライズ、プライバシ規制、コスト最適化、レイテンシ感受性 – は、コードの変更なしに、同じアーキテクチャ上で異なるシグナル決定設定として表現される。マッチした決定はセマンティックモデルルーティングを駆動する: 10以上の選択アルゴリズムが要求特性を分析して、最良のモデルを見つける。一方、決定ごとのプラグインチェーンは、プライバシと安全性の制約(ジェイルブレイク検出、PIIフィルタリング、三段階のHaluGateパイプラインによる幻覚検出)を強制する。このシステムは、ステートフルなマルチターン会話のサポート、異種バックエンド(vLLM、OpenAI、Anthropic、Azure、Bedrock、Gemini、Vertex AI)をまたいだマルチターンのマルチエンドポイントとマルチプロデューサルーティング、および複数の認証プロバイダをサポートするプラグイン可能な認証ファクトリを提供する。 Envoy外部プロセッサとして本番環境にデプロイされたこのアーキテクチャは、構成可能なシグナルオーケストレーションによって、コスト、プライバシ、安全ポリシを区別したさまざまなデプロイメントシナリオを、単一のルーティングフレームワークで実現することを実証している。

論文の概要: vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

関連論文リスト