Fugu-MT 論文翻訳(概要): GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution

論文の概要: GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution

arxiv url: http://arxiv.org/abs/2603.16769v1
Date: Mon, 16 Mar 2026 15:24:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.42953
Title: GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution
Title（参考訳）: GDPO-SR:一段階生成画像スーパーリゾリューションのためのグループ直接選好最適化
Authors: Qiaosi Yi, Shuai Li, Rongyuan Wu, Lingchen Sun, Zhengqiang Zhang, Lei Zhang,
Abstract要約: 合成画像超解像(ISR)の性能向上には強化学習(RL)が用いられている。 Group Direct Preference Optimization (GDPO)は、RLをワンステップ生成型ISRモデルトレーニングに統合するための新しいアプローチである。
参考スコア（独自算出の注目度）: 25.492081909928533
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, reinforcement learning (RL) has been employed for improving generative image super-resolution (ISR) performance. However, the current efforts are focused on multi-step generative ISR, while one-step generative ISR remains underexplored due to its limited stochasticity. In addition, RL methods such as Direct Preference Optimization (DPO) require the generation of positive and negative sample pairs offline, leading to a limited number of samples, while Group Relative Policy Optimization (GRPO) only calculates the likelihood of the entire image, ignoring local details that are crucial for ISR. In this paper, we propose Group Direct Preference Optimization (GDPO), a novel approach to integrate RL into one-step generative ISR model training. First, we introduce a noise-aware one-step diffusion model that can generate diverse ISR outputs. To prevent performance degradation caused by noise injection, we introduce an unequal-timestep strategy to decouple the timestep of noise addition from that of diffusion. We then present the GDPO strategy, which integrates the principle of GRPO into DPO, to calculate the group-relative advantage of each online generated sample for model optimization. Meanwhile, an attribute-aware reward function is designed to dynamically evaluate the score of each sample based on its statistics of smooth and texture areas. Experiments demonstrate the effectiveness of GDPO in enhancing the performance of one-step generative ISR models. Code: https://github.com/Joyies/GDPO.
Abstract（参考訳）: 近年, 画像生成超解像(ISR)の性能向上に強化学習(RL)が用いられている。しかし、現状の取り組みは多段階生成型ISRに焦点が当てられ、一方1段階生成型ISRは確率性に限界があるため未発見のままである。さらに、直接選好最適化(DPO)のようなRL法では、正と負のサンプルペアをオフラインで生成する必要があるため、サンプル数は限られている。本稿では,RLを一段階生成型ISRモデルトレーニングに統合する新しい手法であるグループ直接選好最適化(GDPO)を提案する。まず,様々なISR出力を生成可能なノイズ対応ワンステップ拡散モデルを提案する。ノイズ注入による性能劣化を防止するため,ノイズ付加の時間ステップと拡散の時間ステップを分離する不等時間戦略を導入する。次に,GRPO の原理を DPO に統合した GDPO 戦略を提案し,各オンライン標本の集団相対的優位性をモデル最適化のために算出する。一方,スムーズ・テクスチャ領域の統計に基づいて,各サンプルのスコアを動的に評価する属性認識報酬関数が設計されている。 GDPOの1段階生成型ISRモデルの性能向上効果を示す実験を行った。コード:https://github.com/Joyies/GDPO。

論文の概要: GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution

関連論文リスト