Fugu-MT 論文翻訳(概要): UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning

論文の概要: UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning

arxiv url: http://arxiv.org/abs/2511.14760v1
Date: Tue, 18 Nov 2025 18:59:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 16:23:53.278297
Title: UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning
Title（参考訳）: UniGen-1.5:Reinforcement Learningにおけるリワード統一による画像生成と編集の強化
Authors: Rui Tian, Mingfei Gao, Haiming Gang, Jiasen Lu, Zhe Gan, Yinfei Yang, Zuxuan Wu, Afshin Dehghan,
Abstract要約: We present UniGen-1.5, a unified multimodal large language model (MLLM) for advanced image understand, generation and editing。 UniGenを基盤として、画像理解と生成能力を強化するために、モデルアーキテクチャとトレーニングパイプラインを包括的に強化する。
参考スコア（独自算出の注目度）: 77.17292564002328
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present UniGen-1.5, a unified multimodal large language model (MLLM) for advanced image understanding, generation and editing. Building upon UniGen, we comprehensively enhance the model architecture and training pipeline to strengthen the image understanding and generation capabilities while unlocking strong image editing ability. Especially, we propose a unified Reinforcement Learning (RL) strategy that improves both image generation and image editing jointly via shared reward models. To further enhance image editing performance, we propose a light Edit Instruction Alignment stage that significantly improves the editing instruction comprehension that is essential for the success of the RL training. Experimental results show that UniGen-1.5 demonstrates competitive understanding and generation performance. Specifically, UniGen-1.5 achieves 0.89 and 4.31 overall scores on GenEval and ImgEdit that surpass the state-of-the-art models such as BAGEL and reaching performance comparable to proprietary models such as GPT-Image-1.
Abstract（参考訳）: We present UniGen-1.5, a unified multimodal large language model (MLLM) for advanced image understand, generation and editing。 UniGen上に構築したモデルアーキテクチャとトレーニングパイプラインを包括的に拡張し,強力な画像編集能力を確保しつつ,画像理解と生成能力を強化する。特に,共有報酬モデルを用いて画像生成と画像編集を協調的に改善する統合強化学習(RL)戦略を提案する。画像編集性能をさらに向上させるために,RLトレーニングの成功に不可欠な編集命令理解を大幅に改善するライト編集命令調整ステージを提案する。実験の結果、UniGen-1.5は競争力のある理解と生成性能を示すことが示された。具体的には、UniGen-1.5は、BAGELのような最先端のモデルを超え、GPT-Image-1のようなプロプライエタリなモデルに匹敵するパフォーマンスに達するGenEvalとImgEditで0.89と4.31のスコアを得る。

論文の概要: UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning

関連論文リスト