Fugu-MT 論文翻訳(概要): EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

論文の概要: EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

arxiv url: http://arxiv.org/abs/2509.23909v2
Date: Tue, 30 Sep 2025 15:34:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:44:59.871072
Title: EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
Title（参考訳）: EditScore:高忠実リワードモデリングによる画像編集のためのオンラインRLのロック解除
Authors: Xin Luo, Jiahao Wang, Chenyuan Wu, Shitao Xiao, Xiyan Jiang, Defu Lian, Jiajun Zhang, Dong Liu, Zheng liu,
Abstract要約: 強化学習(RL)は有望な解決策を提供するが、画像編集におけるその採用は、高忠実で効率的な報酬信号の欠如によって妨げられている。我々は、最先端の特殊報酬モデルの開発を中心に、この障壁を克服するための包括的な方法論を提案する。
参考スコア（独自算出の注目度）: 71.8265422228785
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instruction-guided image editing has achieved remarkable progress, yet current models still face challenges with complex instructions and often require multiple samples to produce a desired result. Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been severely hindered by the lack of a high-fidelity, efficient reward signal. In this work, we present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model. We first introduce EditReward-Bench, a comprehensive benchmark to systematically evaluate reward models on editing quality. Building on this benchmark, we develop EditScore, a series of reward models (7B-72B) for evaluating the quality of instruction-guided image editing. Through meticulous data curation and filtering, EditScore effectively matches the performance of learning proprietary VLMs. Furthermore, coupled with an effective self-ensemble strategy tailored for the generative nature of EditScore, our largest variant even surpasses GPT-5 in the benchmark. We then demonstrate that a high-fidelity reward model is the key to unlocking online RL for image editing. Our experiments show that, while even the largest open-source VLMs fail to provide an effective learning signal, EditScore enables efficient and robust policy optimization. Applying our framework to a strong base model, OmniGen2, results in a final model that shows a substantial and consistent performance uplift. Overall, this work provides the first systematic path from benchmarking to reward modeling to RL training in image editing, showing that a high-fidelity, domain-specialized reward model is the key to unlocking the full potential of RL in this domain.
Abstract（参考訳）: インストラクション誘導画像編集は目覚ましい進歩を遂げているが、現在のモデルは複雑な命令を伴う課題に直面しており、望まれる結果を得るためには複数のサンプルを必要とすることが多い。強化学習(RL)は有望な解決策を提供するが、画像編集におけるその採用は、高忠実で効率的な報酬信号の欠如によって著しく妨げられている。本研究では,この障壁を克服するための包括的方法論を提案する。まず,編集品質の報奨モデルを体系的に評価するための総合ベンチマークであるEditReward-Benchを紹介する。このベンチマークに基づいて,命令誘導画像編集の品質を評価するための報酬モデル (7B-72B) であるEditScoreを開発した。厳密なデータキュレーションとフィルタリングによって、EditScoreは、学習専用のVLMのパフォーマンスに効果的にマッチする。さらに、EditScoreの生成性に合わせた効果的なセルフアンサンブル戦略と組み合わせることで、ベンチマークではGPT-5を超えています。次に、画像編集のためのオンラインRLをアンロックする鍵として、高忠実度報酬モデルが重要であることを実証する。実験の結果,最大規模のオープンソースVLMでも効果的な学習信号は得られなかったが,EditScoreは効率的でロバストなポリシー最適化を実現していることがわかった。 OmniGen2という強力なベースモデルに私たちのフレームワークを適用することで、実質的で一貫したパフォーマンス向上を示す最終モデルが出来上がります。全体として、この研究は、ベンチマークから報酬モデリング、画像編集におけるRLトレーニングへの最初の体系的なパスを提供し、高忠実でドメイン特化報酬モデルが、この領域におけるRLの潜在能力を解放する鍵であることを示している。

論文の概要: EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

関連論文リスト