Fugu-MT 論文翻訳(概要): Self-Speculative Biased Decoding for Faster Live Translation

論文の概要: Self-Speculative Biased Decoding for Faster Live Translation

arxiv url: http://arxiv.org/abs/2509.21740v1
Date: Fri, 26 Sep 2025 01:13:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.104075
Title: Self-Speculative Biased Decoding for Faster Live Translation
Title（参考訳）: 高速なライブ翻訳のための自己投機的バイアスデコーディング
Authors: Linxiao Zeng, Haoyun Deng, Kangyuan Shu, Shizhen Wang,
Abstract要約: Self-Speculative Biased Decodingは、一貫して成長する入力ストリームに対して、スクラッチから繰り返し出力を生成するのを避けるために設計された、新しい推論パラダイムである。提案手法は,従来の自己回帰的再翻訳に比べて,品質を損なうことなく最大1.7倍の高速化を実現する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have recently demonstrated impressive capabilities in various text generation tasks. However, it remains challenging to use them off-the-shelf in streaming applications (such as live translation), where the output must continually update as the input context expands, while still maintaining a reasonable computational cost to meet the latency requirement. In this work, we reexamine the re-translation approach to simultaneous translation and propose Self-Speculative Biased Decoding, a novel inference paradigm designed to avoid repeatedly generating output from scratch for a consistently growing input stream. We propose using the most recent output as a draft for the current growing input context. During the verification stage, the output will be biased towards the draft token for a higher draft acceptance rate. This strategy not only minimizes flickering that might distract users but also leads to higher speedups. Conventional decoding may take charge from the point of divergence after draft verification and continue until the end condition is met. Unlike existing speculative decoding strategies, our approach eliminates the need for draft computations, making it a model-agnostic and plug-and-play solution for accelerating latency-sensitive streaming applications. Experimental results on simultaneous text-to-text re-translation demonstrate that our approach achieves up to 1.7x speedup compared to conventional auto-regressive re-translation without compromising quality. Additionally, it significantly reduces flickering by 80% by incorporating the display-only mask-k technique.
Abstract（参考訳）: 大規模言語モデル(LLM)は、最近、様々なテキスト生成タスクにおいて印象的な機能を示した。しかし、ストリーミングアプリケーション(ライブ翻訳など)では、入力コンテキストが拡大するにつれて出力が継続的に更新されなければならないが、レイテンシの要件を満たすための適切な計算コストは維持されている。本研究では、同時翻訳における再翻訳手法を再検討し、一貫した入力ストリームに対してスクラッチから繰り返し出力を発生させないよう設計された新しい推論パラダイムであるセルフスペクトルバイアスデコーディングを提案する。最新の出力を現在の増加する入力コンテキストのドラフトとして使用することを提案する。検証段階では、より高いドラフト受け入れ率のために、出力はドラフトトークンにバイアスされる。この戦略は、ユーザーが気を散らす可能性のあるフリッカリングを最小限に抑えるだけでなく、スピードアップも促進する。従来の復号法は、原案検証後に分岐点から担当し、最終条件が満たされるまで継続することができる。既存の投機的復号化戦略とは違って,提案手法はドラフト計算の必要性を排除し,遅延に敏感なストリーミングアプリケーションを高速化するためのモデルに依存しない,プラグアンドプレイのソリューションとなる。テキストからテキストへの同時翻訳実験の結果,従来の自動回帰翻訳に比べて1.7倍の高速化が達成された。さらに、ディスプレイオンリーのマスク-k技術を取り入れることで、フリッカリングを80%削減する。

論文の概要: Self-Speculative Biased Decoding for Faster Live Translation

関連論文リスト