Fugu-MT 論文翻訳(概要): Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs

論文の概要: Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs

arxiv url: http://arxiv.org/abs/2509.25771v1
Date: Tue, 30 Sep 2025 04:32:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 17:09:04.428035
Title: Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs
Title（参考訳）: 優先画像ペアのないテキスト・画像拡散モデルの自由ランチアライメント
Authors: Jia Jun Cheng Xian, Muchen Li, Haotian Yang, Xin Tao, Pengfei Wan, Leonid Sigal, Renjie Liao,
Abstract要約: 本稿では,T2Iモデルの"フリーランチ"アライメントを可能にするフレームワークであるText Preference Optimization (TPO)を紹介する。 TPOは、ミスマッチしたプロンプトよりもマッチしたプロンプトを好むようにモデルを訓練することで機能する。我々のフレームワークは汎用的で、既存の嗜好ベースのアルゴリズムと互換性がある。
参考スコア（独自算出の注目度）: 36.42060582800515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in diffusion-based text-to-image (T2I) models have led to remarkable success in generating high-quality images from textual prompts. However, ensuring accurate alignment between the text and the generated image remains a significant challenge for state-of-the-art diffusion models. To address this, existing studies employ reinforcement learning with human feedback (RLHF) to align T2I outputs with human preferences. These methods, however, either rely directly on paired image preference data or require a learned reward function, both of which depend heavily on costly, high-quality human annotations and thus face scalability limitations. In this work, we introduce Text Preference Optimization (TPO), a framework that enables "free-lunch" alignment of T2I models, achieving alignment without the need for paired image preference data. TPO works by training the model to prefer matched prompts over mismatched prompts, which are constructed by perturbing original captions using a large language model. Our framework is general and compatible with existing preference-based algorithms. We extend both DPO and KTO to our setting, resulting in TDPO and TKTO. Quantitative and qualitative evaluations across multiple benchmarks show that our methods consistently outperform their original counterparts, delivering better human preference scores and improved text-to-image alignment. Our Open-source code is available at https://github.com/DSL-Lab/T2I-Free-Lunch-Alignment.
Abstract（参考訳）: 近年の拡散型テキスト・ツー・イメージ(T2I)モデルの発展により,テキスト・プロンプトから高品質な画像を生成することに成功した。しかし、テキストと生成された画像の正確なアライメントを確保することは、最先端の拡散モデルにとって重要な課題である。これを解決するために、既存の研究では強化学習と人間のフィードバック(RLHF)を用いて、T2I出力を人間の好みに合わせる。しかし、これらの手法は、ペア画像の好みデータに直接依存するか、学習された報酬関数を必要とする。本研究では、T2Iモデルの「フリーランチ」アライメントを可能にするフレームワークであるテキスト優先最適化(TPO)を導入し、ペア画像優先データを必要としないアライメントを実現する。 TPOは、マッチしたプロンプトよりもマッチしたプロンプトを好むようにモデルを訓練することで機能する。我々のフレームワークは汎用的で、既存の嗜好ベースのアルゴリズムと互換性がある。我々はDPOとKTOの両方を設定に拡張し、結果としてTDPOとTKTOとなる。複数のベンチマークで定量的、質的な評価を行った結果、我々の手法は元の手法よりも一貫して優れており、人間の好みのスコアが向上し、テキストと画像のアライメントが改善されていることがわかった。オープンソースコードはhttps://github.com/DSL-Lab/T2I-Free-Lunch-Alignmentで公開しています。

論文の概要: Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs

関連論文リスト