Fugu-MT 論文翻訳(概要): Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

論文の概要: Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

arxiv url: http://arxiv.org/abs/2603.12247v1
Date: Thu, 12 Mar 2026 17:57:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.28585
Title: Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation
Title（参考訳）: 批判の信頼: 忠実な画像編集と生成のためのロバスト・リワードモデリングと強化学習
Authors: Xiangyu Zhao, Peiyuan Zhang, Junming Lin, Tianhao Liang, Yuchen Duan, Shengyuan Ding, Changyao Tian, Yuhang Zang, Junchi Yan, Xue Yang,
Abstract要約: 現在の報酬モデルは、強化学習の時に批評家として機能し、しばしば幻覚に悩まされ、うるさいスコアを割り当てる。我々は、忠実な画像生成と編集のための正確で信頼性の高いガイダンスを提供するために、堅牢な報酬モデルを開発する包括的フレームワークFIRMを提案する。 FIRMは幻覚を緩和し、既存の一般的なモデルに対する忠実さと命令順守の新しい標準を確立した。
参考スコア（独自算出の注目度）: 67.26349227500084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) has emerged as a promising paradigm for enhancing image editing and text-to-image (T2I) generation. However, current reward models, which act as critics during RL, often suffer from hallucinations and assign noisy scores, inherently misguiding the optimization process. In this paper, we present FIRM (Faithful Image Reward Modeling), a comprehensive framework that develops robust reward models to provide accurate and reliable guidance for faithful image generation and editing. First, we design tailored data curation pipelines to construct high-quality scoring datasets. Specifically, we evaluate editing using both execution and consistency, while generation is primarily assessed via instruction following. Using these pipelines, we collect the FIRM-Edit-370K and FIRM-Gen-293K datasets, and train specialized reward models (FIRM-Edit-8B and FIRM-Gen-8B) that accurately reflect these criteria. Second, we introduce FIRM-Bench, a comprehensive benchmark specifically designed for editing and generation critics. Evaluations demonstrate that our models achieve superior alignment with human judgment compared to existing metrics. Furthermore, to seamlessly integrate these critics into the RL pipeline, we formulate a novel "Base-and-Bonus" reward strategy that balances competing objectives: Consistency-Modulated Execution (CME) for editing and Quality-Modulated Alignment (QMA) for generation. Empowered by this framework, our resulting models FIRM-Qwen-Edit and FIRM-SD3.5 achieve substantial performance breakthroughs. Comprehensive experiments demonstrate that FIRM mitigates hallucinations, establishing a new standard for fidelity and instruction adherence over existing general models. All of our datasets, models, and code have been publicly available at https://firm-reward.github.io.
Abstract（参考訳）: Reinforcement Learning (RL) は画像編集とテキスト・トゥ・イメージ(T2I) 生成を向上するための有望なパラダイムとして登場した。しかしながら、現在の報酬モデルは、RLの期間中に批評家として機能し、しばしば幻覚に悩まされ、ノイズの多いスコアを割り当てる。本稿では、忠実な画像生成と編集のための正確で信頼性の高いガイダンスを提供するために、堅牢な報酬モデルを開発する包括的フレームワークFIRM(Faithful Image Reward Modeling)を提案する。まず、高品質なスコアリングデータセットを構築するためのデータキュレーションパイプラインを設計する。具体的には、実行と一貫性の両方を用いて編集を評価し、生成は、主に指示に従って評価する。これらのパイプラインを用いて、FIRM-Edit-370KとFIRM-Gen-293Kデータセットを収集し、これらの基準を正確に反映した特別報酬モデル(FIRM-Edit-8BとFIRM-Gen-8B)を訓練する。第二に、FIRM-Benchは、批評家の編集と生成に特化して設計された包括的なベンチマークである。評価の結果,既存の指標と比較すると,人間の判断との整合性が良好であることが示唆された。さらに、これらの批判をRLパイプラインにシームレスに統合するために、競合する目標のバランスをとる新しい「Base-and-Bonus」報酬戦略を定式化します。 FIRM-Qwen-Edit と FIRM-SD3.5 は,本フレームワークを応用し,大幅な性能向上を実現している。包括的実験により、FIRMは幻覚を緩和し、既存の一般モデルに対する忠実性と命令順守の新しい標準を確立した。すべてのデータセット、モデル、コードはhttps://firm-reward.github.io.comで公開されています。

論文の概要: Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

関連論文リスト