Fugu-MT 論文翻訳(概要): Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance

論文の概要: Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance

arxiv url: http://arxiv.org/abs/2210.05559v1
Date: Tue, 11 Oct 2022 15:53:52 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-12 15:28:04.760459
Title: Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance
Title（参考訳）: 拡散モデルの潜在空間の統一とサイクル拡散と誘導への応用
Authors: Chen Henry Wu, Fernando De la Torre
Abstract要約: 関係領域で独立に訓練された2つの拡散モデルから共通潜時空間が現れることを示す。テキスト・画像拡散モデルにCycleDiffusionを適用することで、大規模なテキスト・画像拡散モデルがゼロショット画像・画像拡散エディタとして使用できることを示す。
参考スコア（独自算出の注目度）: 95.12230117950232
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models have achieved unprecedented performance in generative modeling. The commonly-adopted formulation of the latent code of diffusion models is a sequence of gradually denoised samples, as opposed to the simpler (e.g., Gaussian) latent space of GANs, VAEs, and normalizing flows. This paper provides an alternative, Gaussian formulation of the latent space of various diffusion models, as well as an invertible DPM-Encoder that maps images into the latent space. While our formulation is purely based on the definition of diffusion models, we demonstrate several intriguing consequences. (1) Empirically, we observe that a common latent space emerges from two diffusion models trained independently on related domains. In light of this finding, we propose CycleDiffusion, which uses DPM-Encoder for unpaired image-to-image translation. Furthermore, applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors. (2) One can guide pre-trained diffusion models and GANs by controlling the latent codes in a unified, plug-and-play formulation based on energy-based models. Using the CLIP model and a face recognition model as guidance, we demonstrate that diffusion models have better coverage of low-density sub-populations and individuals than GANs.
Abstract（参考訳）: 拡散モデルは、生成モデリングにおいて前例のない性能を達成した。拡散モデルの潜伏符号の一般的な定式化は、GAN、VAE、正規化フローのより単純な(例えばガウス的な)潜伏空間とは対照的に、徐々に分解されたサンプルの列である。本稿では,様々な拡散モデルの潜在空間のオルタナティブなガウス的定式化と,画像を潜在空間にマッピングする可逆 dpm-エンコーダを提供する。我々の定式化は純粋に拡散モデルの定義に基づいているが、いくつかの興味深い結果を示す。 1) 実験的に, 関連ドメインに依存しない2つの拡散モデルから, 共通潜時空間が出現することが観察された。そこで本研究では,dpmエンコーダを用いて画像から画像への変換を行うcyclediffusionを提案する。さらに,CycleDiffusionをテキスト・画像拡散モデルに適用することにより,大規模なテキスト・画像拡散モデルをゼロショット画像・画像エディタとして利用できることを示す。 2) エネルギーモデルに基づくプラグイン・アンド・プレイの統一的定式化において, 遅延符号を制御することにより, 事前学習した拡散モデルとGANを導くことができる。 CLIPモデルと顔認識モデルを用いて,拡散モデルがGANよりも低密度のサブ集団や個人をより多くカバーできることを示す。

関連論文リスト

Continuous Diffusion Model for Language Modeling [57.396578974401734]
離散データに対する既存の連続拡散モデルは、離散的アプローチと比較して性能が限られている。本稿では,下層の分類分布の幾何学を組み込んだ言語モデリングのための連続拡散モデルを提案する。
論文参考訳（メタデータ） (2025-02-17T08:54:29Z)
Distilling Diffusion Models into Conditional GANs [90.76040478677609]
複雑な多段階拡散モデルを1段階条件付きGAN学生モデルに蒸留する。 E-LatentLPIPSは,拡散モデルの潜在空間で直接動作する知覚的損失である。我々は, 最先端の1ステップ拡散蒸留モデルよりも優れた1ステップ発生器を実証した。
論文参考訳（メタデータ） (2024-05-09T17:59:40Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
ガイダンスは拡散モデルにおいて重要な概念として機能するが、その効果は追加のデータアノテーションや事前学習の必要性によって制限されることが多い。本稿では,拡散モデルからガイダンスを抽出するフレームワークを提案する。
論文参考訳（メタデータ） (2023-12-14T11:19:11Z)
Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models [76.46246743508651]
我々は,現在の拡散モデルが後方認知において表現力のあるボトルネックを持っていることを示した。本稿では,後方復調のための表現的かつ効率的なモデルであるソフトミキシング・デノナイジング(SMD)を導入する。
論文参考訳（メタデータ） (2023-09-25T12:03:32Z)
Infinite-Dimensional Diffusion Models [4.342241136871849]
拡散に基づく生成モデルを無限次元で定式化し、関数の生成モデルに適用する。我々の定式化は無限次元の設定においてよく成り立っていることを示し、サンプルから目標測度への次元非依存距離境界を提供する。また,無限次元拡散モデルの設計ガイドラインも作成する。
論文参考訳（メタデータ） (2023-02-20T18:00:38Z)
SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
SinDiffusionは1つの自然な画像からパッチの内部分布を捉えるためにデノナイズ拡散モデルを利用する。 SinDiffusionは、2つのコア設計に基づいている。まず、SinDiffusionは、段階的にスケールが成長する複数のモデルではなく、1つのスケールで1つのモデルで訓練されている。第2に,拡散ネットワークのパッチレベルの受容領域は,画像のパッチ統計を捉える上で重要かつ効果的であることを示す。
論文参考訳（メタデータ） (2022-11-22T18:00:03Z)
Blurring Diffusion Models [27.339469450737525]
非等方性雑音を持つガウス拡散過程により, ボケリングが等価に定義されることを示す。本稿では,拡散と逆熱散逸を両立させる標準ガウス微分モデルを提案する。
論文参考訳（メタデータ） (2022-09-12T19:16:48Z)
Diffusion Models in Vision: A Survey [80.82832715884597]
拡散モデルは、前方拡散段階と逆拡散段階の2つの段階に基づく深層生成モデルである。拡散モデルは、既知の計算負荷にもかかわらず、生成したサンプルの品質と多様性に対して広く評価されている。
論文参考訳（メタデータ） (2022-09-10T22:00:30Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。