Fugu-MT 論文翻訳(概要): Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models

論文の概要: Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models

arxiv url: http://arxiv.org/abs/2606.13558v1
Date: Thu, 11 Jun 2026 16:41:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.922891
Title: Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models
Title（参考訳）: Bitwise Residual Editing for Visual Autoregressive Models
Authors: Shengqiang Zhang, Ruotong Liao, Volker Tresp, Barbara Plank, Hinrich Schütze,
Abstract要約: BitResEditは、Infinityのようなビットワイズ残留VARジェネレータのためのトレーニング不要のエディタである。ソース-負のガイダンスは、共有編集プレフィックスで計算されたソース-ターゲットコントラストに沿って、ビットごとのポストCFGログ-oddを傾けることで実行される。サンプルされたビットをスケールごとの連続コード残基に変換し、ローカライゼーションマスクでゲートし、ジェネレータのネイティブサム・オブ・スケールを通じて再注入する。
参考スコア（独自算出の注目度）: 85.59447229497101
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-guided image editing with visual autoregressive (VAR) generators requires controlling both what the model samples and where the sampled change is written back into the image code. Existing VAR editors mainly operate on token streams, features, or flat next-token logits, leaving two native structures of bitwise-residual VAR models underused: the per-bit Bernoulli prediction head and the additive multi-scale residual code field from which the image is assembled. We propose BitResEdit, a training-free editor for bitwise-residual VAR generators such as Infinity. BitEdit performs source-negative guidance by tilting the post-CFG per-bit log-odds along a source--target contrast computed on a shared edited prefix, then projects each update into a closed-form Bernoulli-KL trust region around the clean CFG sampler. ResEdit converts the sampled bits into per-scale continuous-code residuals, gates them with a localization mask, and re-injects them through the generator's native sum-of-scales. Together they couple decision-time bit guidance with combination-time code composition, so masked-out latent features are preserved exactly by code arithmetic while localized, scale-aware edits are applied inside the target region. On PIE-Bench with Infinity-2B, BitResEdit attains the strongest text alignment among same-backbone VAR editors, improving CLIP on the edited region by +1.07 over the strongest prior editor while keeping background preservation competitive with it. Ablations show BitEdit and ResEdit play complementary roles in target alignment and background preservation.
Abstract（参考訳）: テキスト誘導画像編集と視覚自己回帰(VAR)ジェネレータは、モデルサンプルとサンプル変更がイメージコードに書き戻される場所の両方を制御する必要がある。既存のVARエディタは、主にトークンストリーム、機能、フラットな次世代ロジットで動作し、ビット単位のVARモデルのネイティブな2つの構造、すなわちビット単位のBernoulli予測ヘッドと、イメージが組み立てられる付加的なマルチスケール残留コードフィールドが未使用のままである。 Infinityのようなビットワイズ残留VARジェネレータのためのトレーニング不要エディタBitResEditを提案する。 BitEditは、共有編集プレフィックスで計算されたソースターゲットコントラストに沿って、ビット単位のポストCFGログノードを傾けて、ソース負のガイダンスを実行し、クリーンCFGサンプルラの周りのクローズドフォームBernoulli-KL信頼領域に各更新を投影する。 ResEditはサンプルビットをスケールごとの連続コード残量に変換し、ローカライゼーションマスクでゲートし、ジェネレータのネイティブサム・オブ・スケールを通じて再注入する。両者は、決定時間ビット誘導と組み合わせたコード合成を組み合わせ、マスク付き潜在機能は、コード演算によって正確に保存され、ローカライズされたスケールアウェアな編集は、対象領域内に適用される。 Infinity-2BのPIE-Benchでは、BitResEditは、同じバックボーンのVARエディタの中で最強のテキストアライメントを実現し、編集領域のCLIPを最強の前のエディタよりも+1.07改善し、バックグラウンドの保存を競争力を維持する。 BitEditとResEditは、ターゲットアライメントとバックグラウンド保存において補完的な役割を果たす。

論文の概要: Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models

関連論文リスト