Fugu-MT 論文翻訳(概要): ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL

論文の概要: ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL

arxiv url: http://arxiv.org/abs/2606.19103v1
Date: Wed, 17 Jun 2026 14:16:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 17:16:51.196557
Title: ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL
Title（参考訳）: プロダクト一貫性: SFTとRLによるインストラクションベース画像編集における製品アイデンティティ保護の改善
Authors: Mukund Khanna, Raj Singh Yadav, Kunal Singh,
Abstract要約: 本稿では,製品中心の画像編集を改善するために設計されたProductConsistencyデータセットを紹介する。当社のアプローチには,製品編集のための87kサンプルの教師付き微調整(SFT)データセットと,869のユニークな製品イメージを備えた強化学習(RL)データセットが含まれる。 RLトレーニングの指針として,製品識別のセマンティックな保存を強制するサイクル一貫性報酬を提案する。
参考スコア（独自算出の注目度）: 4.71547360356314
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in instruction-based image editing have enabled models to perform complex visual edits from natural language instructions. However, in product-centric scenarios where preserving product features, branding, and textual elements are critical, current open and closed source models often struggle to maintain this fine-grained object identity. This issue is further compounded by the lack of datasets for instruction-based product image editing with text fidelity constraints, leaving it largely treated as an implicit capability of instruction-based image editing models. In this work, we introduce the ProductConsistency dataset which is designed to improve product-centric image editing. Our approach includes a supervised fine-tuning (SFT) dataset of 87k samples for product editing, a reinforcement learning (RL) dataset with 869 unique product images, and a new benchmark dataset, the ProductConsistency Benchmark, to allow rigorous and standardized evaluation of editing models. To guide RL training, we propose a Cyclic Consistency reward that enforces semantic preservation of product identity by using caption similarity between the original product description and captions generated from the edited image. We fine-tune both Qwen-Image-Edit-2511 and Flux.1-Kontext-dev using our dataset and demonstrate consistent improvements over baseline models in OCR and Perceptual metrics, and MLLM-based evaluations as well, indicating stronger product consistency, text rendering, and overall visual quality; with the Qwen-Image-Edit-2511 model achieving a 5x reduction in the character error rate. The code and pipeline is available at https://anonymous.4open.science/r/ProductConsistency-6FCC/README.md
Abstract（参考訳）: 命令ベースの画像編集の最近の進歩により、モデルは自然言語命令から複雑な視覚的編集を実行できるようになった。しかしながら、製品機能、ブランディング、テキスト要素の保存が重要な製品中心のシナリオでは、現在のオープンでクローズドなソースモデルは、このきめ細かいオブジェクトのアイデンティティを維持するのに苦労することが多い。この問題は、命令ベースの製品画像編集のためのデータセットがテキストの忠実度制約によって欠如していることによってさらに複雑化しており、命令ベースの画像編集モデルの暗黙の能力として主に扱われている。本稿では,製品中心の画像編集を改善するために設計されたProductConsistencyデータセットを紹介する。我々のアプローチには、製品編集のための87kサンプルの教師付き微調整(SFT)データセットと、869のユニークな製品イメージを持つ強化学習(RL)データセットと、編集モデルの厳格かつ標準化された評価を可能にする新しいベンチマークデータセットであるProductConsistency Benchmarkが含まれている。 RLトレーニングのガイドとして,編集画像から生成したキャプションとオリジナル製品記述のキャプション類似性を利用して,製品識別のセマンティックな保存を強制するサイクル一貫性報酬を提案する。我々は、このデータセットを用いてQwen-Image-Edit-2511とFlux.1-Kontext-devの両方を微調整し、OCRおよびPerceptualメトリクスのベースラインモデルよりも一貫した改善を示し、MLLMに基づく評価も行った。コードとパイプラインはhttps://anonymous.4open.science/r/ProductConsistency-6FCC/README.mdで公開されている。

論文の概要: ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL

関連論文リスト