Fugu-MT 論文翻訳(概要): SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

論文の概要: SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

arxiv url: http://arxiv.org/abs/2604.09557v1
Date: Tue, 10 Feb 2026 16:19:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-19 19:09:11.493165
Title: SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
Title（参考訳）: SPEED-Bench: 投機的デコードのための統一および多元的ベンチマーク
Authors: Talor Abramovich, Maor Ashkenazi, Carl, Putterman, Benjamin Chislett, Tiyasa Mitra, Bita Darvish Rouhani, Ran Zilberstein, Yonatan Geifman,
Abstract要約: 投機的復号化(SD)は,Large Language Model(LLM)推論を高速化する重要な手法として登場した。以前のベンチマークでは、タスクの多様性の制限、スループット指向の評価の不十分なサポート、プロダクション環境を反映できないハイレベルな実装への依存に悩まされていた。 SPEED-Benchは多種多様な意味領域と現実的なサービス体制をまたいだSD評価を標準化するために設計された総合的なスイートである。
参考スコア（独自算出の注目度）: 3.876913658180685
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and representative workloads are essential for accurately measuring its effectiveness. Existing benchmarks suffer from limited task diversity, inadequate support for throughput-oriented evaluation, and a reliance on high-level implementations that fail to reflect production environments. To address this, we introduce SPEED-Bench, a comprehensive suite designed to standardize SD evaluation across diverse semantic domains and realistic serving regimes. SPEED-Bench offers a carefully curated Qualitative data split, selected by prioritizing semantic diversity across the data samples. Additionally, it includes a Throughput data split, allowing speedup evaluation across a range of concurrencies, from latency-sensitive low-batch settings to throughput-oriented high-load scenarios. By integrating with production engines like vLLM and TensorRT-LLM, SPEED-Bench allows practitioners to analyze system behaviors often masked by other benchmarks. We highlight this by quantifying how synthetic inputs overestimate real-world throughput, identifying batch-size dependent optimal draft lengths and biases in low-diversity data, and analyzing the caveats of vocabulary pruning in state-of-the-art drafters. We release SPEED-Bench to establish a unified evaluation standard for practical comparisons of SD algorithms.
Abstract（参考訳）: 投機的復号化(SD)は,Large Language Model(LLM)推論を高速化する重要な手法として登場した。決定論的システム最適化とは異なり、SD性能は本質的にデータ依存であり、多種多様な代表的ワークロードがその効果を正確に測定するために不可欠である。既存のベンチマークは、タスクの多様性の制限、スループット指向の評価の不十分なサポート、プロダクション環境を反映できないハイレベルな実装への依存に悩まされている。 SPEED-Benchは,多様なセマンティックドメインと現実的なサーブレジームにまたがるSD評価の標準化を目的とした総合的なスイートである。 SPEED-Benchは、データサンプル間のセマンティック多様性の優先順位付けによって選択された、慎重にキュレートされた定性データ分割を提供する。さらに、スループデータ分割が含まれており、レイテンシに敏感なローバッチ設定からスループット指向の高負荷シナリオに至るまで、さまざまな並行処理のスピードアップ評価が可能になる。 vLLMやTensorRT-LLMといったプロダクションエンジンを統合することで、SPEED-Benchは実践者が他のベンチマークで隠されたシステムの振る舞いを分析することができる。実世界のスループットを過大評価する方法を定量化し、低多様性データにおけるバッチサイズ依存の最適ドラフト長とバイアスを特定し、最先端のドラフトラにおける語彙プルーニングの注意点を解析することによって、これを強調する。我々はSPEED-Benchをリリースし、SDアルゴリズムの実用的な比較のための統一評価基準を確立する。

論文の概要: SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

関連論文リスト