Fugu-MT 論文翻訳(概要): Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

論文の概要: Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

arxiv url: http://arxiv.org/abs/2512.05033v2
Date: Tue, 09 Dec 2025 18:32:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-10 16:15:28.096366
Title: Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
Title（参考訳）: Arbitrage:Advantage-Aware Speculationによる効率的な推論
Authors: Monishwaran Maheswaran, Rishabh Tiwari, Yuezhou Hu, Kerem Dilmen, Coleman Hooper, Haocheng Xi, Nicholas Lee, Mehrdad Farajtabar, Michael W. Mahoney, Kurt Keutzer, Amir Gholami,
Abstract要約: 投機的復号化は、高速だが不正確なドラフトモデルを用いて推論を加速し、自動回帰的にトークンを提案する。しかし、意味論的に等価なステップにおけるトークンミスマッチによる不要な拒絶のため、従来のトークンレベルの投機的デコーディングは、タスクの推論に苦労する。提案するArbitrageは,ドラフトモデルとターゲットモデルとの相対的優位性に基づいて動的に生成をルーティングする,新しいステップレベルの投機生成フレームワークである。
参考スコア（独自算出の注目度）: 71.45710345765528
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern Large Language Models achieve impressive reasoning capabilities with long Chain of Thoughts, but they incur substantial computational cost during inference, and this motivates techniques to improve the performance-cost ratio. Among these techniques, Speculative Decoding accelerates inference by employing a fast but inaccurate draft model to autoregressively propose tokens, which are then verified in parallel by a more capable target model. However, due to unnecessary rejections caused by token mismatches in semantically equivalent steps, traditional token-level Speculative Decoding struggles in reasoning tasks. Although recent works have shifted to step-level semantic verification, which improve efficiency by accepting or rejecting entire reasoning steps, existing step-level methods still regenerate many rejected steps with little improvement, wasting valuable target compute. To address this challenge, we propose Arbitrage, a novel step-level speculative generation framework that routes generation dynamically based on the relative advantage between draft and target models. Instead of applying a fixed acceptance threshold, Arbitrage uses a lightweight router trained to predict when the target model is likely to produce a meaningfully better step. This routing approximates an ideal Arbitrage Oracle that always chooses the higher-quality step, achieving near-optimal efficiency-accuracy trade-offs. Across multiple mathematical reasoning benchmarks, Arbitrage consistently surpasses prior step-level Speculative Decoding baselines, reducing inference latency by up to $\sim2\times$ at matched accuracy.
Abstract（参考訳）: 現代の大規模言語モデルは、長い思考の連鎖によって印象的な推論能力を達成するが、推論中にかなりの計算コストを発生させるため、パフォーマンスとコストの比率を改善する技術が動機となっている。これらの技術の中で、投機的復号化は高速だが不正確なドラフトモデルを用いて推論を加速し、トークンを自動回帰的に提案し、より有能なターゲットモデルによって並列に検証する。しかし、意味論的に等価なステップにおけるトークンミスマッチによる不要な拒絶のため、従来のトークンレベルの投機的デコーディングは、タスクの推論に苦労する。最近の研究は、ステップレベルのセマンティック検証に移行し、すべての推論ステップを受け入れたり拒否したりすることで効率を向上しているが、既存のステップレベルのメソッドは、多くの拒否ステップをほとんど改善せずに再生し、貴重な目標計算を無駄にしている。この課題に対処するため、我々は、ドラフトモデルとターゲットモデルとの相対的優位性に基づいて、動的に生成をルーティングする新しいステップレベルの投機生成フレームワークArbitrageを提案する。 Arbitrageは、固定された受け入れしきい値を適用する代わりに、トレーニングされた軽量ルータを使用して、ターゲットモデルが有意義に優れたステップを発生させる可能性を予測する。このルーティングは、常に高品質なステップを選択し、ほぼ最適の効率と精度のトレードオフを達成する理想的なArbitrage Oracleに近似する。複数の数学的推論ベンチマークで、ArbitrageはステップレベルのSpeculative Decodingベースラインを一貫して上回り、一致した精度で推論遅延を最大$\sim2\times$まで削減する。

論文の概要: Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

関連論文リスト