Fugu-MT 論文翻訳(概要): Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing

論文の概要: Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing

arxiv url: http://arxiv.org/abs/2604.09386v1
Date: Fri, 10 Apr 2026 14:58:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 17:57:53.918043
Title: Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing
Title（参考訳）: フローベース画像編集のための領域制約群相対ポリシー最適化
Authors: Zhuohan Ouyang, Zhe Qian, Wenhuo Cui, Chaoqun Wang,
Abstract要約: 本稿では,領域制約付きGRPOポストトレーニングフレームワークであるRC-GRPO-Editingを提案する。バックグラウンド誘起ニュアンス分散を抑制して、よりクリーンなローカライズされたクレジット割り当てを可能にし、地域命令の順守を改善し、非ターゲットコンテンツを保存する。 CompBenchの実験では、編集領域の命令順守と非ターゲット保存が一貫した改善が見られた。
参考スコア（独自算出の注目度）: 2.096755686662369
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instruction-guided image editing requires balancing target modification with non-target preservation. Recently, flow-based models have emerged as a strong and increasingly adopted backbone for instruction-guided image editing, thanks to their high fidelity and efficient deterministic ODE sampling. Building on this foundation, GRPO-based reward-driven post-training has been explored to directly optimize editing-specific rewards, improving instruction following and editing consistency. However, existing methods often suffer from noisy credit assignment: global exploration also perturbs non-target regions, inflating within-group reward variance and yielding noisy GRPO advantages. To address this, we propose RC-GRPO-Editing, a region-constrained GRPO post-training framework for flow-based image editing under deterministic ODE sampling. It suppresses background-induced nuisance variance to enable cleaner localized credit assignment, improving editing region instruction adherence while preserving non-target content. Concretely, we localize exploration via region-decoupled initial noise perturbations to reduce background-induced reward variance and stabilize GRPO advantages, and introduce an attention concentration reward that aligns cross-attention with the intended editing region throughout the rollout, reducing unintended changes in non-target regions. Experiments on CompBench show consistent improvements in editing region instruction adherence and non-target preservation.
Abstract（参考訳）: インストラクション誘導画像編集は、標的修正と非目標保存のバランスをとる必要がある。近年、フローベースモデルは、高い忠実度と効率的な決定論的ODEサンプリングのおかげで、命令誘導画像編集のバックボーンとして強く採用されつつある。この基盤の上に構築されたGRPOベースの報酬駆動後トレーニングは、編集固有の報酬を直接最適化し、指示追従を改善し、一貫性を編集する。しかし、既存の手法は、しばしばノイズの多い信用割り当てに悩まされる: グローバルな探索は、非ターゲット領域を摂動させ、グループ内の報酬分散を膨らませ、ノイズの多いGRPOの利点をもたらす。そこで本研究では,領域制約付きGRPOポストトレーニングフレームワークであるRC-GRPO-Editingを提案する。バックグラウンド誘起ニュアンス分散を抑制して、よりクリーンなローカライズされたクレジット割り当てを可能にし、非ターゲットコンテンツを保持しながら、領域命令の順守を改善する。具体的には、地域分離初期ノイズ摂動による探索を局所化し、背景要因による報酬分散を低減し、GRPOの利点を安定させるとともに、ロールアウト全体を通して意図した編集領域との相互依存を一致させる注意集中報酬を導入し、非ターゲット領域における意図しない変化を低減させる。 CompBenchの実験では、編集領域の命令順守と非ターゲット保存が一貫した改善が見られた。

論文の概要: Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing

関連論文リスト