Fugu-MT 論文翻訳(概要): Copy-as-Decode: Grammar-Constrained Parallel Prefill for LLM Editing

論文の概要: Copy-as-Decode: Grammar-Constrained Parallel Prefill for LLM Editing

arxiv url: http://arxiv.org/abs/2604.18170v1
Date: Mon, 20 Apr 2026 12:29:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.861945
Title: Copy-as-Decode: Grammar-Constrained Parallel Prefill for LLM Editing
Title（参考訳）: コピー・アズ・デコード:LLM編集のための文法制約付き並列プリフィル
Authors: Ziyang Liu,
Abstract要約: LLMは、入力中にほとんどのトークンが冗長に見える場合でも、全出力を自動回帰的に再生することでテキストとコードを編集する。 Copy-as-Decodeは、2プリミティブ文法上の構造化復号化として生成を再キャストする復号化機構である。
参考スコア（独自算出の注目度）: 2.6382975801439836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLMs edit text and code by autoregressively regenerating the full output, even when most tokens appear verbatim in the input. We study Copy-as-Decode, a decoding-layer mechanism that recasts edit generation as structured decoding over a two-primitive grammar: <copy lines="i-j"/> references an input line range, <gen>...</gen> emits new content. A token-level FSM guarantees syntactic validity, and a serving-layer primitive updates the KV cache for each copy span via a single parallel-prefill forward rather than $N$ autoregressive steps -- sharing the parallel-forward kernel of speculative decoding but with input tokens as the draft and program-enforced acceptance replacing probabilistic verification. We report an upper-bound analysis that requires no end-to-end training. (i) Kernel speedup: on Qwen2.5-{1.5B, 7B}, copying $N$ tokens via parallel prefill is $6.8\times$--$303\times$ faster than autoregressive ($N \in [8, 512]$, A100 80GB bf16). (ii) Copy ceiling: on ProbeEdit and HumanEvalPack-Fix (Py/JS), $74$--$98\%$ of gold tokens are reachable under the line-level primitive; composed with the empirical kernel over each corpus's span histogram this yields a closed-form wall-clock bound of $29.0\times / 3.4\times / 4.2\times$ ($13.0\times$ pooled). A token-level extension reaches $91$--$99\%$ coverage with $4.5\times$--$6.5\times$ floors. (iii) Pipeline losslessness: oracle programs round-trip through the deterministic resolver on all $482$ cases, localizing any downstream failure to span selection rather than the mechanism. A perturbation study shows pooled EM drops from $100\%$ to $15.48\%$ under off-by-one noise. A fine-tuning pilot on Qwen2.5-Coder-1.5B lifts HEvalFix-Py EM from $0/33$ (untrained) to $12$--$17\%$, a learnability signal, not a production selector. Batched-serving integration and multi-file coverage are scoped as follow-up.
Abstract（参考訳）: LLMは、入力中にほとんどのトークンが冗長に見える場合でも、全出力を自動回帰的に再生することでテキストとコードを編集する。 Copy line="i-j"/> は入力行範囲を参照し、<gen>...</gen> は新しいコンテンツを出力する。トークンレベルのFSMは構文上の妥当性を保証し、各コピーに対するKVキャッシュは、$N$の自己回帰ステップではなく、単一の並列プリフィルによって更新される。エンドツーエンドのトレーニングを必要としない上行分析を報告する。 (i)カーネルスピードアップ: Qwen2.5-{1.5B, 7B}では、パラレルプリフィルを介してN$トークンをコピーすると、オートレグレッシブ(N \in [8, 512]$, A100 80GB bf16)よりも高速な6.8\times$--303\times$である。 (ii)コピー天井:ProbeEditとHumanEvalPack-Fix (Py/JS)では、ラインレベルのプリミティブの下で7,4$--98\%の金メダルが到達可能で、各コーパスのスパンヒストグラム上の経験的カーネルで構成され、閉じた形の壁時計が29.0\times / 3.4\times / 4.2\times$$$13.0\times$ pooledとなる。トークンレベルの拡張は、$4.5\times$-$6.5\times$ floorsで、911$-$99\%$カバレッジに達する。 3) パイプライン損失性: オラクルプログラムは、すべての482ドルのケースで決定論的リゾルバを往復し、下流の障害をメカニズムよりも選択に分散させる。摂動調査によると、プールされたEMは1対1のノイズの下で100\%から15.48\%に値下げされた。 Qwen2.5-Coder-1.5Bの微調整パイロットはHEvalFix-Py EMを0/33ドル(トレーニングなし)から12ドル--17\%ドルに引き上げる。バッチサービス統合とマルチファイルカバレッジは、フォローアップとしてスコープされる。

論文の概要: Copy-as-Decode: Grammar-Constrained Parallel Prefill for LLM Editing

関連論文リスト