Diversity and Diffusion: Observations on Synthetic Image Distributions
with Stable Diffusion
- URL: http://arxiv.org/abs/2311.00056v1
- Date: Tue, 31 Oct 2023 18:05:15 GMT
- Title: Diversity and Diffusion: Observations on Synthetic Image Distributions
with Stable Diffusion
- Authors: David Marwood, Shumeet Baluja, Yair Alon
- Abstract summary: Text-to-image (TTI) systems have made it possible to create realistic images with simple text prompts.
In all of the experiments performed to date, classifiers trained solely with synthetic images perform poorly at inference.
We find four issues that limit the usefulness of TTI systems for this task: ambiguity, adherence to prompt, lack of diversity, and inability to represent the underlying concept.
- Score: 6.491645162078057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in text-to-image (TTI) systems, such as StableDiffusion,
Imagen, and DALL-E 2, have made it possible to create realistic images with
simple text prompts. It is tempting to use these systems to eliminate the
manual task of obtaining natural images for training a new machine learning
classifier. However, in all of the experiments performed to date, classifiers
trained solely with synthetic images perform poorly at inference, despite the
images used for training appearing realistic. Examining this apparent
incongruity in detail gives insight into the limitations of the underlying
image generation processes. Through the lens of diversity in image creation
vs.accuracy of what is created, we dissect the differences in semantic
mismatches in what is modeled in synthetic vs. natural images. This will
elucidate the roles of the image-languag emodel, CLIP, and the image generation
model, diffusion. We find four issues that limit the usefulness of TTI systems
for this task: ambiguity, adherence to prompt, lack of diversity, and inability
to represent the underlying concept. We further present surprising insights
into the geometry of CLIP embeddings.
Related papers
- Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - CIFAKE: Image Classification and Explainable Identification of
AI-Generated Synthetic Images [7.868449549351487]
This article proposes to enhance our ability to recognise AI-generated images through computer vision.
The two sets of data present as a binary classification problem with regard to whether the photograph is real or generated by AI.
This study proposes the use of a Convolutional Neural Network (CNN) to classify the images into two categories; Real or Fake.
arXiv Detail & Related papers (2023-03-24T16:33:06Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.