Fugu-MT 論文翻訳(概要): Dual Caption Preference Optimization for Diffusion Models

論文の概要: Dual Caption Preference Optimization for Diffusion Models

arxiv url: http://arxiv.org/abs/2502.06023v2
Date: Sat, 18 Oct 2025 18:05:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 00:56:38.178839
Title: Dual Caption Preference Optimization for Diffusion Models
Title（参考訳）: 拡散モデルのためのデュアルキャプション選好最適化
Authors: Amir Saeidi, Yiran Luo, Agneet Chatterjee, Shamanthak Hegde, Bimsara Pathiraja, Yezhou Yang, Chitta Baral,
Abstract要約: テキストから画像への拡散モデルを改善するために、Dual Caption Preference Optimization (DCPO)を導入する。 DCPOは2つの異なるキャプションをそれぞれの好みペアに割り当て、学習信号を強化する。実験の結果,DCPOは画像品質とプロンプトとの関連性を著しく改善することがわかった。
参考スコア（独自算出の注目度）: 53.218293277964165
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in human preference optimization, originally developed for Large Language Models (LLMs), have shown significant potential in improving text-to-image diffusion models. These methods aim to learn the distribution of preferred samples while distinguishing them from less preferred ones. However, within the existing preference datasets, the original caption often does not clearly favor the preferred image over the alternative, which weakens the supervision signal available during training. To address this issue, we introduce Dual Caption Preference Optimization (DCPO), a data augmentation and optimization framework that reinforces the learning signal by assigning two distinct captions to each preference pair. This encourages the model to better differentiate between preferred and less-preferred outcomes during training. We also construct Pick-Double Caption, a modified version of Pick-a-Pic v2 with separate captions for each image, and propose three different strategies for generating distinct captions: captioning, perturbation, and hybrid methods. Our experiments show that DCPO significantly improves image quality and relevance to prompts, outperforming Stable Diffusion (SD) 2.1, SFT_Chosen, Diffusion-DPO, and MaPO across multiple metrics, including Pickscore, HPSv2.1, GenEval, CLIPscore, and ImageReward, fine-tuned on SD 2.1 as the backbone.
Abstract（参考訳）: 近年,Large Language Models (LLMs) 向けに開発された人間の嗜好最適化の進歩は,テキスト・画像拡散モデルの改善に大きな可能性を示している。これらの方法は、あまり好ましくないサンプルと区別しながら、好ましくないサンプルの分布を学習することを目的としている。しかし、既存の嗜好データセットでは、オリジナルのキャプションは、トレーニング中に利用可能な監視シグナルを弱める、代替案よりも好まれるイメージを好まないことが多い。この問題に対処するために、データ拡張および最適化フレームワークであるDual Caption Preference Optimization (DCPO)を導入し、それぞれに2つの異なるキャプションを割り当てることで学習信号を強化する。これにより、トレーニング中に好ましくない結果と好ましくない結果とをよりよく区別することが可能になる。また,Pick-a-Pic v2の修正版であるPick-Double Captionを構築し,キャプション,摂動,ハイブリッドの3つの異なるキャプションを生成する方法を提案する。実験の結果,DCPOはSD2.1,SFT_Chosen,Diffusion-DPO,MaPOを,Pickscore,HPSv2.1,GenEval,CLIPscore,ImageRewardなど複数の指標で比較し,SD2.1,SFT_Chosen,Diffusion-DPO,MaPOよりも優れた画像品質とプロンプトとの関連性を示した。

論文の概要: Dual Caption Preference Optimization for Diffusion Models

関連論文リスト