Fugu-MT 論文翻訳(概要): Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

論文の概要: Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

arxiv url: http://arxiv.org/abs/2410.03190v2
Date: Tue, 12 Nov 2024 00:37:33 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:35.206485
Title: Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Title（参考訳）: ペアワイズサンプル最適化を用いた時間ステップ拡散モデルのチューニング
Authors: Zichen Miao, Zhengyuan Yang, Kevin Lin, Ze Wang, Zicheng Liu, Lijuan Wang, Qiang Qiu,
Abstract要約: 任意の時間ステップ蒸留拡散モデルを直接微調整できるPSOアルゴリズムを提案する。 PSOは、現在の時間ステップ蒸留モデルからサンプリングされた追加の参照画像を導入し、トレーニング画像と参照画像との相対的な近縁率を増大させる。 PSOは、オフラインとオンラインのペアワイズ画像データの両方を用いて、蒸留モデルを直接人間の好ましくない世代に適応させることができることを示す。
参考スコア（独自算出の注目度）: 97.35427957922714
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advancements in timestep-distilled diffusion models have enabled high-quality image generation that rivals non-distilled multi-step models, but with significantly fewer inference steps. While such models are attractive for applications due to the low inference cost and latency, fine-tuning them with a naive diffusion objective would result in degraded and blurry outputs. An intuitive alternative is to repeat the diffusion distillation process with a fine-tuned teacher model, which produces good results but is cumbersome and computationally intensive; the distillation training usually requires magnitude higher of training compute compared to fine-tuning for specific image styles. In this paper, we present an algorithm named pairwise sample optimization (PSO), which enables the direct fine-tuning of an arbitrary timestep-distilled diffusion model. PSO introduces additional reference images sampled from the current time-step distilled model, and increases the relative likelihood margin between the training images and reference images. This enables the model to retain its few-step generation ability, while allowing for fine-tuning of its output distribution. We also demonstrate that PSO is a generalized formulation which can be flexibly extended to both offline-sampled and online-sampled pairwise data, covering various popular objectives for diffusion model preference optimization. We evaluate PSO in both preference optimization and other fine-tuning tasks, including style transfer and concept customization. We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data. PSO also demonstrates effectiveness in style transfer and concept customization by directly tuning timestep-distilled diffusion models.
Abstract（参考訳）: 近年の時間分割拡散モデルの進歩により、非蒸留多段階モデルに匹敵する高品質な画像生成が可能になったが、推論ステップは大幅に少なくなった。このようなモデルは、低推論コストと遅延のためにアプリケーションにとって魅力的であるが、単純な拡散目標でそれらを微調整すると、劣化し、ぼやけた出力が得られる。直感的な代替手段は、優れた結果を生み出すが、複雑で計算集約的な、微調整された教師モデルで拡散蒸留を繰り返すことである。本稿では,任意の時間ステップ蒸留拡散モデルを直接微調整できるPSOアルゴリズムを提案する。 PSOは、現在の時間ステップ蒸留モデルからサンプリングされた追加の参照画像を導入し、トレーニング画像と参照画像との相対的な近縁率を増大させる。これにより、モデルは出力分布を微調整しながら、数ステップの生成能力を維持できる。また、PSOは、オフラインサンプリングとオンラインサンプリングの両方のペアワイズデータに柔軟に拡張できる一般化された定式化であり、拡散モデル優先最適化の様々な一般的な目的をカバーできることを示した。我々は、好みの最適化と、スタイル転送やコンセプトのカスタマイズなど、その他の微調整タスクにおいてPSOを評価する。 PSOは、オフラインとオンラインのペアワイズ画像データの両方を用いて、蒸留モデルを直接人間の好ましくない世代に適応させることができることを示す。 PSOはまた、時間ステップ蒸留拡散モデルを直接チューニングすることで、スタイル転送と概念カスタマイズの有効性を示す。

論文の概要: Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

関連論文リスト