MT-EditFlow: Reinforcement Learning for Multi-Turn Image Editing with Flow Matching
Abstract Overview
MT-EditFlow is a reinforcement learning framework for multi-turn image editing built on flow-matching models. The paper argues that open-source image editors trained mainly for single-turn edits degrade in sequential settings because one failed step can ruin the whole sequence and errors propagate across turns. To address this, the method combines a multi-turn formulation with two reward components (instruction following and content consistency), and studies how reward aggregation, evaluator prompting mode, and fusion strategy affect training. The framework is designed to work with both GRPO- and DiffusionNFT-style reinforcement learning methods, utilizing trajectory-level advantage broadcasting to align local edits with overall multi-turn success.
Novelty
The paper's main novelty is a unified reward-signal design for multi-turn image editing under flow-matching reinforcement learning, rather than the usual single-turn, single-reward setup. It also introduces and analyzes specific design choices for this setting, including multi-turn reward aggregation, advantage-level fusion of instruction-following and content-consistency signals, and trajectory-level advantage broadcasting.
Results
On EdiVal-Bench, MT-EditFlow improves FLUX.1-Kontext-dev by 6.85 points in turn-3 overall performance and FLUX.2-klein-base-9B by 2.90 points, with gains especially pronounced at later turns. The reported FLUX.1-Kontext-dev result also exceeds the open-source Qwen-Image-Edit baseline on turn-3 overall score. The method additionally yields modest single-turn gains on ImgEdit-Bench and shows flatter success decay across turns, indicating reduced exposure bias.
Key Points
- MT-EditFlow extends flow-matching RL to sequential image editing by optimizing both instruction following and content consistency over multi-turn trajectories.
- The paper finds that fine-grained per-turn scoring, thinking-mode VLM evaluation, and advantage-level fusion provide more effective reward signals than sparser or less normalized alternatives.
- Experiments show stronger multi-turn robustness on open-source backbones, with larger improvements at later turns and evidence of reduced error propagation across the editing chain.