Fugu-MT 論文翻訳(概要): BRIDGE: Background Routing and Isolated Discrete Gating for Coarse-Mask Local Editing

論文の概要: BRIDGE: Background Routing and Isolated Discrete Gating for Coarse-Mask Local Editing

arxiv url: http://arxiv.org/abs/2605.07846v2
Date: Mon, 11 May 2026 17:16:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 19:24:01.428181
Title: BRIDGE: Background Routing and Isolated Discrete Gating for Coarse-Mask Local Editing
Title（参考訳）: BRIDGE: 粗面局所編集のためのバックグラウンドルーティングと分離離散ゲーティング
Authors: Peilin Xiong, Honghui Yuan, Junwen Chen, Keiji Yanai,
Abstract要約: 粗いマスクのローカル画像編集は、周囲のシーンを保存しながら、ユーザが指定した領域を変更するモデルを要求する。本研究では,この障害をマスク形状バイアスとして検討し,そのタスクを2次元制約によりフレーム化する。 Bridgeはこの設定に対処するため、DiTバックボーンの外側にマスクを置き、構築とブレンディングをサポートし、DiT内部マスク注入やコピーコントロールブランチを避ける。
参考スコア（独自算出の注目度）: 6.102786823233576
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Coarse-mask local image editing asks a model to modify a user-indicated region while preserving the surrounding scene. In practice, however, rough masks often become unintended shape priors: instead of serving as flexible edit support, the mask can pull the generated object toward its accidental boundary. We study this failure as mask-shape bias and frame the task through a Two-Zone Constraint, where the background should remain stable while the editable region should follow the instruction without being forced to inherit the mask contour. BRIDGE addresses this setting by keeping masks outside the DiT backbone for support construction and blending, avoiding DiT-internal mask injection and copied control branches. It uses BridgePath generation, where a Main Path preserves background context and a Subject Path generates editable content from independent noise. Motivated by a diagnostic Qwen-Image experiment showing that positional embeddings and attention connectivity regulate which image context visual tokens reuse, BRIDGE introduces a learnable Discrete Geometric Gate for token-level positional-embedding routing. This gate lets subject tokens borrow background-anchored coordinates near fusion regions or keep subject-centric coordinates for geometric freedom. We evaluate BRIDGE on BRIDGE-Bench, MagicBrush, and ICE-Bench. On BRIDGE-Bench, BRIDGE improves Local SigLIP2-T from 0.262 with FLUX.1-Fill and 0.390 with ACE++ to 0.503, with parallel gains in local DINO and DreamSim. Zero-shot results on MagicBrush and ICE-Bench further indicate competitive alignment and source preservation beyond the curated benchmark, while the added routing module remains compact at 13.31M parameters compared with ControlNet-style copied branches.
Abstract（参考訳）: 粗いマスクのローカル画像編集は、周囲のシーンを保存しながら、ユーザが指定した領域を変更するモデルを要求する。しかし実際には、粗いマスクは意図しない形になることが多く、フレキシブルな編集サポートとして機能する代わりに、マスクは生成された物体を偶然の境界に向かって引っ張ることができる。本研究では,この障害をマスク形状バイアスとして検討し,背景が安定でありながら,編集可能な領域がマスク輪郭の継承を強制されることなく命令に従うような2次元制約(Two-Zone Constraint)を通した。 BRIDGEはこの設定に対処するため、DiTバックボーンの外側にマスクを置き、構築とブレンディングをサポートし、DiT内部マスク注入やコピーコントロールブランチを避ける。 Main Pathがバックグラウンドコンテキストを保存し、Subject Pathが独立したノイズから編集可能なコンテンツを生成するBridgePath生成を使用する。 BRIDGEは、位置埋め込みとアテンション接続が画像コンテキストの視覚トークンの再利用を制御することを示す診断Qwen-Image実験により、トークンレベルの位置埋め込みルーティングのための学習可能な離散幾何学ゲートを導入した。このゲートは、被写体トークンが融合領域付近で背景アンコールされた座標を借りたり、幾何学的自由のために被写体中心の座標を維持できる。 BRIDGEをBRIDGE-Bench, MagicBrush, ICE-Benchで評価した。 BRIDGE-Benchでは、ローカルSigLIP2-TをFLUX.1-Fillで0.262、ACE++で0.390、ローカルDINOとDreamSimで0.503に改善した。 MagicBrush と ICE-Bench のゼロショット結果はさらに、コンパイルされたベンチマークを超える競合的なアライメントとソース保存を示す一方で、追加のルーティングモジュールは ControlNet スタイルのコピーブランチと比較して 13.31M のパラメータでコンパクトである。

論文の概要: BRIDGE: Background Routing and Isolated Discrete Gating for Coarse-Mask Local Editing

関連論文リスト