Fugu-MT 論文翻訳(概要): OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

論文の概要: OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

arxiv url: http://arxiv.org/abs/2508.21066v1
Date: Thu, 28 Aug 2025 17:59:46 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-29 18:12:02.555563
Title: OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning
Title（参考訳）: OneReward:マルチタスク人選好学習による統一マスクガイド画像生成
Authors: Yuan Gong, Xionghui Wang, Jie Wu, Shiyin Wang, Yitong Wang, Xinglong Wu,
Abstract要約: OneRewardは統合強化学習フレームワークで、複数のタスクにわたってモデルの生成能力を向上する。マルチタスク強化学習によるマスク誘導生成モデルであるSeedream 3.0 Fillを開発した。
参考スコア（独自算出の注目度）: 26.133555631867385
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce OneReward, a unified reinforcement learning framework that enhances the model's generative capabilities across multiple tasks under different evaluation criteria using only \textit{One Reward} model. By employing a single vision-language model (VLM) as the generative reward model, which can distinguish the winner and loser for a given task and a given evaluation criterion, it can be effectively applied to multi-task generation models, particularly in contexts with varied data and diverse task objectives. We utilize OneReward for mask-guided image generation, which can be further divided into several sub-tasks such as image fill, image extend, object removal, and text rendering, involving a binary mask as the edit area. Although these domain-specific tasks share same conditioning paradigm, they differ significantly in underlying data distributions and evaluation metrics. Existing methods often rely on task-specific supervised fine-tuning (SFT), which limits generalization and training efficiency. Building on OneReward, we develop Seedream 3.0 Fill, a mask-guided generation model trained via multi-task reinforcement learning directly on a pre-trained base model, eliminating the need for task-specific SFT. Experimental results demonstrate that our unified edit model consistently outperforms both commercial and open-source competitors, such as Ideogram, Adobe Photoshop, and FLUX Fill [Pro], across multiple evaluation dimensions. Code and model are available at: https://one-reward.github.io
Abstract（参考訳）: 本稿では,複数のタスクにまたがるモデル生成能力を向上させる統合強化学習フレームワークであるOneRewardを紹介する。一つの視覚言語モデル(VLM)を生成報酬モデルとし、与えられたタスクの勝者と敗者と評価基準を区別することにより、マルチタスク生成モデル、特に様々なデータと多様なタスク目的のコンテキストにおいて効果的に適用することができる。 We use OneReward for mask-guided image generation, which can be into several sub-tasks such as image fill, image extends, object removed, text rendering, involved a binary mask as the edit area。これらのドメイン固有のタスクは、同じ条件付けパラダイムを共有しているが、基礎となるデータ分散と評価指標で大きく異なる。既存の手法はしばしば、一般化と訓練の効率を制限するタスク固有の教師付き微調整(SFT)に依存している。 On OneReward, we developed Seedream 3.0 Fill, a mask-guided generation model, training through multi-task reinforcement learning directly on a pre-trained base model, away the need of task-specific SFT。この統合編集モデルは,Ideogram,Adobe Photoshop,FLUX Fill[Pro]など,商用およびオープンソース双方のコンペティタを,複数の評価範囲で一貫して上回っていることを示す。コードとモデルについては、https://one-reward.github.ioで公開されています。

論文の概要: OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

関連論文リスト