Fugu-MT 論文翻訳(概要): MT-EditFlow: Reinforcement Learning for Multi-Turn Image Editing with Flow Matching

論文の概要: MT-EditFlow: Reinforcement Learning for Multi-Turn Image Editing with Flow Matching

arxiv url: http://arxiv.org/abs/2606.01985v1
Date: Mon, 01 Jun 2026 09:46:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:31.702281
Title: MT-EditFlow: Reinforcement Learning for Multi-Turn Image Editing with Flow Matching
Title（参考訳）: MT-EditFlow:フローマッチングを用いたマルチTurn画像編集のための強化学習
Authors: Jiahui Huang, Yasi Zhang, Tianyu Chen, Shu Wang, Jianwen Xie, Oscar Leong, Mingyuan Zhou, Nanzhu Wang, Ying Nian Wu,
Abstract要約: MT-EditFlowは、逐次画像編集のための報酬信号の最適化を目的とした、フローマッチング強化学習フレームワークである。 MT-EditFlowは多種多様なベースモデル間で性能を著しく向上させることを示す。
参考スコア（独自算出の注目度）: 91.83651402045108
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent breakthroughs in instruction-based image editing have captured significant attention, as models are now capable of handling real-world editing demands with the practicality required by everyday users. However, editing models trained primarily for single-turn edits often break down in multi-turn editing--the natural interactive setting where a user iteratively refines an image based on the model's own previous outputs. This failure stems from the all-or-nothing requirement, where a single failed turn compromises the entire sequence, and error propagation, where exposure bias leads to compounding editing errors. To address these challenges, we introduce MT-EditFlow, a flow-matching reinforcement learning framework designed to optimize reward signals for sequential image editing. MT-EditFlow integrates a multi-turn perspective with a multi-reward formulation to provide a unified structure applicable to both GRPO and NFT-based reinforcement learning methods. We systematically analyze and optimize the reward signal by investigating effective scoring strategies for turn-level aggregation, VLM reasoning modes to trade off reward bias and variance, and advantage fusion levels to prevent reward hacking. Our findings reveal that broadcasting the aggregated advantage across the entire editing trajectory effectively bridges the gap between local planning and global multi-turn task success. Extensive experiments demonstrate that MT-EditFlow significantly improves performance across diverse base models. Notably, it boosts FLUX.1-Kontext-dev by 6.85 points in turn-3 overall performance, surpassing state-of-the-art open-source models such as Qwen-Image-Edit. By maintaining high marginal success rates and reducing exposure bias, MT-EditFlow provides a foundation for more reliable and natural human-AI collaboration in visual content creation.
Abstract（参考訳）: 近年のインストラクションベースの画像編集のブレークスルーは、日々のユーザに必要な実用性で現実の編集要求を処理できるモデルが登場し、大きな注目を集めている。しかし、主にシングルターン編集のために訓練された編集モデルは、ユーザがモデルの以前の出力に基づいて画像を反復的に洗練する自然なインタラクティブな設定であるマルチターン編集でしばしば分解される。この失敗は、1回の失敗がシーケンス全体を損なうというオール・オー・ナッシングの要求と、露出バイアスが編集エラーを複雑にするエラーの伝播に起因している。これらの課題に対処するために,逐次画像編集のための報酬信号の最適化を目的としたフローマッチング強化学習フレームワークMT-EditFlowを紹介する。 MT-EditFlowはマルチターン・パースペクティブとマルチリワード・フォーミュレーションを統合し、GRPOとNFTベースの強化学習法の両方に適用可能な統一構造を提供する。我々は、ターンレベルのアグリゲーション、報酬バイアスと分散をトレードオフするVLM推論モード、報酬ハッキングを防ぐための融合レベルを効果的に評価し、報酬信号を体系的に分析し、最適化する。この結果から,編集過程全体にわたって集約的優位性をブロードキャストすることで,局所的な計画とグローバルなマルチターンタスク成功のギャップを効果的に埋めることができることがわかった。 MT-EditFlowは多種多様なベースモデルのパフォーマンスを大幅に向上させる。 FLUX.1-Kontext-devの全体的なパフォーマンスは6.85ポイント向上し、Qwen-Image-Editのような最先端のオープンソースモデルを上回っている。高限界の成功率を維持し、露出バイアスを低減することで、MT-EditFlowは、視覚コンテンツ作成においてより信頼性が高く自然な人間とAIのコラボレーションの基礎を提供する。

論文の概要: MT-EditFlow: Reinforcement Learning for Multi-Turn Image Editing with Flow Matching

関連論文リスト