Fugu-MT 論文翻訳(概要): Reverse Stable Diffusion: What prompt was used to generate this image?

論文の概要: Reverse Stable Diffusion: What prompt was used to generate this image?

arxiv url: http://arxiv.org/abs/2308.01472v1
Date: Wed, 2 Aug 2023 23:39:29 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-04 15:35:50.364902
Title: Reverse Stable Diffusion: What prompt was used to generate this image?
Title（参考訳）: 逆安定拡散: この画像を生成するためにどのプロンプトが使われたか?
Authors: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
Abstract要約: 生成拡散モデルにより生成された画像からテキストプロンプトを予測する新しいタスクを導入する。本稿では,複数ラベルの語彙分類を目的とし,協調的即時回帰と複数ラベルの語彙分類からなる新しい学習フレームワークを提案する。我々はDiffusionDBデータセットの実験を行い、安定拡散によって生成された画像からテキストプロンプトを予測する。
参考スコア（独自算出の注目度）: 80.82832715884597
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image diffusion models such as Stable Diffusion have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to obtain the desired images. To this end, we introduce the new task of predicting the text prompt given an image generated by a generative diffusion model. We combine a series of white-box and black-box models (with and without access to the weights of the diffusion network) to deal with the proposed task. We propose a novel learning framework comprising of a joint prompt regression and multi-label vocabulary classification objective that generates improved prompts. To further improve our method, we employ a curriculum learning procedure that promotes the learning of image-prompt pairs with lower labeling noise (i.e. that are better aligned), and an unsupervised domain-adaptive kernel learning method that uses the similarities between samples in the source and target domains as extra features. We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion. Our novel learning framework produces excellent results on the aforementioned task, yielding the highest gains when applied on the white-box model. In addition, we make an interesting discovery: training a diffusion model on the prompt generation task can make the model generate images that are much better aligned with the input prompts, when the model is directly reused for text-to-image generation.
Abstract（参考訳）: 安定拡散のようなテキストから画像への拡散モデルは、近年多くの研究者の関心を惹きつけており、拡散過程の反転は、生成過程と、所望の画像を得るためにどのようにプロンプトを設計すべきかを理解する上で重要な役割を果たす。そこで本研究では,生成拡散モデルによって生成された画像からテキストプロンプトを予測する新しいタスクを提案する。提案するタスクに対処するために,ホワイトボックスモデルとブラックボックスモデル(拡散ネットワークの重み付きおよびアクセスの無いモデル)を組み合わせる。本稿では,改良されたプロンプトを生成する共同プロンプト回帰と多ラベル語彙分類の目的からなる新しい学習フレームワークを提案する。提案手法をさらに改良するために,低ラベリング雑音によるイメージプロンプトペアの学習を促進するカリキュラム学習手法と,ソース内のサンプルとターゲットドメインとの類似性を付加的な特徴として利用する教師なしドメイン適応型カーネル学習手法を用いる。我々はDiffusionDBデータセットの実験を行い、安定拡散によって生成された画像からテキストプロンプトを予測する。この新しい学習フレームワークは,上記のタスクに対して優れた結果をもたらし,ホワイトボックスモデルに適用した場合の最高値を得る。さらに,本モデルがテキスト・画像生成に直接再利用される場合,プロンプト生成タスク上で拡散モデルをトレーニングすることで,入力プロンプトに整合した画像を生成することができる,という興味深い発見を行う。

論文の概要: Reverse Stable Diffusion: What prompt was used to generate this image?

関連論文リスト