Null-text Guidance in Diffusion Models is Secretly a Cartoon-style
Creator
- URL: http://arxiv.org/abs/2305.06710v4
- Date: Fri, 4 Aug 2023 03:07:41 GMT
- Title: Null-text Guidance in Diffusion Models is Secretly a Cartoon-style
Creator
- Authors: Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wanrong Huang,
Wenjing Yang
- Abstract summary: null-text guidance in diffusion models is secretly a cartoon-style creator, i.e., the generated images can be efficiently transformed into cartoons by simply perturbing the null-text guidance.
We propose two disturbance methods, i.e., Rollback disturbance (Back-D) and Image disturbance (Image-D), to construct misalignment between the noisy images used for predicting null-text guidance and text guidance.
Back-D achieves cartoonization by altering the noise level of null-text noisy image via replacing $x_t$ with $x_t+Delta t$
- Score: 20.329795810937206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifier-free guidance is an effective sampling technique in diffusion
models that has been widely adopted. The main idea is to extrapolate the model
in the direction of text guidance and away from null-text guidance. In this
paper, we demonstrate that null-text guidance in diffusion models is secretly a
cartoon-style creator, i.e., the generated images can be efficiently
transformed into cartoons by simply perturbing the null-text guidance.
Specifically, we proposed two disturbance methods, i.e., Rollback disturbance
(Back-D) and Image disturbance (Image-D), to construct misalignment between the
noisy images used for predicting null-text guidance and text guidance
(subsequently referred to as \textbf{null-text noisy image} and \textbf{text
noisy image} respectively) in the sampling process. Back-D achieves
cartoonization by altering the noise level of null-text noisy image via
replacing $x_t$ with $x_{t+\Delta t}$. Image-D, alternatively, produces
high-fidelity, diverse cartoons by defining $x_t$ as a clean input image, which
further improves the incorporation of finer image details. Through
comprehensive experiments, we delved into the principle of noise disturbing for
null-text and uncovered that the efficacy of disturbance depends on the
correlation between the null-text noisy image and the source image. Moreover,
our proposed techniques, which can generate cartoon images and cartoonize
specific ones, are training-free and easily integrated as a plug-and-play
component in any classifier-free guided diffusion model. Project page is
available at \url{https://nulltextforcartoon.github.io/}.
Related papers
- ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models [55.43801602995778]
We present ImPoster, a novel algorithm for generating a target image of a'source' subject performing a 'driving' action.
Our approach is completely unsupervised and does not require any access to additional annotations like keypoints or pose.
arXiv Detail & Related papers (2024-09-24T01:25:19Z) - DEADiff: An Efficient Stylization Diffusion Model with Disentangled
Representations [64.43387739794531]
Current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles.
We introduce DEADiff to address this issue using the following two strategies.
DEAiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image.
arXiv Detail & Related papers (2024-03-11T17:35:23Z) - UDiffText: A Unified Framework for High-quality Text Synthesis in
Arbitrary Images via Character-aware Diffusion Models [25.219960711604728]
This paper proposes a novel approach for text image generation, utilizing a pre-trained diffusion model.
Our approach involves the design and training of a light-weight character-level text encoder, which replaces the original CLIP encoder.
By employing an inference stage refinement process, we achieve a notably high sequence accuracy when synthesizing text in arbitrarily given images.
arXiv Detail & Related papers (2023-12-08T07:47:46Z) - Reverse Stable Diffusion: What prompt was used to generate this image? [73.10116197883303]
We study the task of predicting the prompt embedding given an image generated by a generative diffusion model.
We propose a novel learning framework comprising a joint prompt regression and multi-label vocabulary classification objective.
We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion.
arXiv Detail & Related papers (2023-08-02T23:39:29Z) - Discriminative Class Tokens for Text-to-Image Diffusion Models [107.98436819341592]
We propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text.
Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images.
We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier.
arXiv Detail & Related papers (2023-03-30T05:25:20Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z) - Null-text Inversion for Editing Real Images using Guided Diffusion
Models [44.27570654402436]
We introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image.
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing.
arXiv Detail & Related papers (2022-11-17T18:58:14Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.