Improving image synthesis with diffusion-negative sampling
- URL: http://arxiv.org/abs/2411.05473v1
- Date: Fri, 08 Nov 2024 10:58:09 GMT
- Title: Improving image synthesis with diffusion-negative sampling
- Authors: Alakh Desai, Nuno Vasconcelos,
- Abstract summary: We propose a new diffusion-negative prompting (DNP) strategy for image generation with diffusion models (DMs)
DNP is based on a new procedure to sample images that are least compliant with p under the distribution of the DM, denoted as diffusion-negative sampling (DNS)
DNS is straightforward to implement and requires no training. Experiments and human evaluations show that DNP performs well both quantitatively and qualitatively.
- Score: 54.84368884047812
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For image generation with diffusion models (DMs), a negative prompt n can be used to complement the text prompt p, helping define properties not desired in the synthesized image. While this improves prompt adherence and image quality, finding good negative prompts is challenging. We argue that this is due to a semantic gap between humans and DMs, which makes good negative prompts for DMs appear unintuitive to humans. To bridge this gap, we propose a new diffusion-negative prompting (DNP) strategy. DNP is based on a new procedure to sample images that are least compliant with p under the distribution of the DM, denoted as diffusion-negative sampling (DNS). Given p, one such image is sampled, which is then translated into natural language by the user or a captioning model, to produce the negative prompt n*. The pair (p, n*) is finally used to prompt the DM. DNS is straightforward to implement and requires no training. Experiments and human evaluations show that DNP performs well both quantitatively and qualitatively and can be easily combined with several DM variants.
Related papers
- Using LLMs as prompt modifier to avoid biases in AI image generators [0.0]
Large Language Models (LLMs) can reduce biases in text-to-image generation systems by modifying user prompts.
Our experiments with Stable Diffusion XL, 3.5 and Flux demonstrate that LLM-modified prompts significantly increase image diversity and reduce bias without the need to change the image generators themselves.
arXiv Detail & Related papers (2025-04-15T11:52:20Z) - Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness [24.465567005078135]
Diffusion models (DMs) have demonstrated great potential in the field of adversarial robustness.
DMs require huge computational costs due to the usage of large-scale pre-trained DMs.
We introduce an efficient Image-to-Image diffusion classifier with a pruned U-Net structure and reduced diffusion timesteps.
Our method achieves better adversarial robustness with fewer computational costs than DM-based and CNN-based methods.
arXiv Detail & Related papers (2024-08-16T03:01:07Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - Reverse Stable Diffusion: What prompt was used to generate this image? [73.10116197883303]
We study the task of predicting the prompt embedding given an image generated by a generative diffusion model.
We propose a novel learning framework comprising a joint prompt regression and multi-label vocabulary classification objective.
We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion.
arXiv Detail & Related papers (2023-08-02T23:39:29Z) - Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into
3D, alleviate Janus problem and Beyond [49.94798429552442]
We propose Perp-Neg, a new algorithm that leverages the geometrical properties of the score space to address the shortcomings of the current negative prompts algorithm.
Perp-Neg does not require any training or fine-tuning of the model.
We demonstrate that Perp-Neg provides greater flexibility in generating images by enabling users to edit out unwanted concepts.
arXiv Detail & Related papers (2023-04-11T04:29:57Z) - Unsupervised Representation Learning from Pre-trained Diffusion
Probabilistic Models [83.75414370493289]
Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples.
Diff-AE have been proposed to explore DPMs for representation learning via autoencoding.
We propose textbfPre-trained textbfAutotextbfEncoding (textbfPDAE) to adapt existing pre-trained DPMs to the decoders for image reconstruction.
arXiv Detail & Related papers (2022-12-26T02:37:38Z) - Safe Latent Diffusion: Mitigating Inappropriate Degeneration in
Diffusion Models [18.701950647429]
Text-conditioned image generation models suffer from degenerated and biased human behavior.
We present safe latent diffusion (SLD) to help combat these undesired side effects.
We show that SLD removes and suppresses inappropriate image parts during the diffusion process.
arXiv Detail & Related papers (2022-11-09T18:54:25Z) - Representation Learning with Diffusion Models [0.0]
Diffusion models (DMs) have achieved state-of-the-art results for image synthesis tasks as well as density estimation.
We introduce a framework for learning such representations with diffusion models (LRDM)
In particular, the DM and the representation encoder are trained jointly in order to learn rich representations specific to the generative denoising process.
arXiv Detail & Related papers (2022-10-20T07:26:47Z) - What can we learn about a generated image corrupting its latent
representation? [57.1841740328509]
We investigate the hypothesis that we can predict image quality based on its latent representation in the GANs bottleneck.
We achieve this by corrupting the latent representation with noise and generating multiple outputs.
arXiv Detail & Related papers (2022-10-12T14:40:32Z) - Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences
for Image-Text Retrieval [19.161248757493386]
We propose our TAiloring neGative Sentences with Discrimination and Correction (TAGS-DC) to generate synthetic sentences automatically as negative samples.
To keep the difficulty during training, we mutually improve the retrieval and generation through parameter sharing.
In experiments, we verify the effectiveness of our model on MS-COCO and Flickr30K compared with current state-of-the-art models.
arXiv Detail & Related papers (2021-11-05T09:36:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.