Fugu-MT 論文翻訳(概要): Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics

論文の概要: Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics

arxiv url: http://arxiv.org/abs/2605.01280v1
Date: Sat, 02 May 2026 06:31:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.680509
Title: Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics
Title（参考訳）: 位置: LLM Servingはヒューリスティックではなく、数学的最適化とアルゴリズムの基礎を必要としている
Authors: Zijie Zhou,
Abstract要約: LLM推論はジェネリックよりも優れており、現在では数学的最適化とアルゴリズムの基礎が要求されている。これらの汎用ポリシーは、LLM推論の特徴的な構造、-動的に増大するKVキャッシュメモリ、プリフィル・デコード位相非対称性、未知の出力長、連続的な制約を無視している。いくつかのシナリオで成功するが、他のシナリオでは予測不可能に失敗するモデルよりも、さまざまなワークロードで保証可能なパフォーマンス保証を備えたアルゴリズムを開発する必要がある、と我々は主張する。
参考スコア（独自算出の注目度）: 3.143753806123382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This position paper argues that LLM inference serving has outgrown generic heuristics and now demands mathematical optimization and algorithmic foundations. Despite rapid advances in serving systems such as vLLM and SGLang, their algorithmic cores remain largely unchanged from classical distributed computing: request routing uses join-shortest-queue or round-robin, scheduling defaults to FIFO, and KV cache eviction follows LRU. These general-purpose policies ignore the distinctive structure of LLM inference--dynamically growing KV cache memory, prefill-decode phase asymmetry, unknown output lengths, and continuous batching constraints. We contend that the field must develop mathematical models capturing these characteristics, enabling the design of algorithms with provable performance guarantees across diverse workloads, rather than heuristics that may succeed in some scenarios but fail unpredictably in others. Emerging work at the intersection of operations research and ML systems demonstrates that principled methods can match or exceed heuristic performance while providing theoretical guarantees. We call on the community to recognize algorithmic design for LLM serving as a research frontier.
Abstract（参考訳）: このポジションペーパーでは、LLM推論は一般的なヒューリスティックよりも優れており、現在、数学的最適化とアルゴリズムの基礎を求めている。 vLLMやSGLangのようなサービスシステムの急速な進歩にもかかわらず、アルゴリズムのコアは古典的な分散コンピューティングと大きく変わらず、リクエストルーティングは join-shortest-queue や round-robin を使用し、FIFO のデフォルトは FIFO に、KV キャッシュは LRU に従っている。これらの汎用ポリシーは、LLM推論の特徴的な構造、-動的に増大するKVキャッシュメモリ、プリフィル・デコード位相非対称性、未知の出力長、連続バッチ制約を無視している。これらの特徴を捉えた数学的モデルを開発し、いくつかのシナリオで成功するが予測不能に失敗するヒューリスティックではなく、様々なワークロードで証明可能な性能保証を備えたアルゴリズムを設計できるようにする必要がある、と我々は主張する。運用研究とMLシステムの交差点における新たな研究は、原理的手法が理論的保証を提供しながらヒューリスティックな性能に適合または超えることを示した。我々はLLMのアルゴリズム設計を研究フロンティアとして認識するようコミュニティに呼びかけている。

論文の概要: Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics

関連論文リスト