Diffusion Models as Artists: Are we Closing the Gap between Humans and
Machines?
- URL: http://arxiv.org/abs/2301.11722v3
- Date: Wed, 31 May 2023 16:02:39 GMT
- Title: Diffusion Models as Artists: Are we Closing the Gap between Humans and
Machines?
- Authors: Victor Boutin, Thomas Fel, Lakshya Singhal, Rishav Mukherji, Akash
Nagaraj, Julien Colin and Thomas Serre
- Abstract summary: We adapt the 'diversity vs. recognizability' scoring framework from Boutin et al, 2022.
We find that one-shot diffusion models have indeed started to close the gap between humans and machines.
- Score: 4.802758600019422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An important milestone for AI is the development of algorithms that can
produce drawings that are indistinguishable from those of humans. Here, we
adapt the 'diversity vs. recognizability' scoring framework from Boutin et al,
2022 and find that one-shot diffusion models have indeed started to close the
gap between humans and machines. However, using a finer-grained measure of the
originality of individual samples, we show that strengthening the guidance of
diffusion models helps improve the humanness of their drawings, but they still
fall short of approximating the originality and recognizability of human
drawings. Comparing human category diagnostic features, collected through an
online psychophysics experiment, against those derived from diffusion models
reveals that humans rely on fewer and more localized features. Overall, our
study suggests that diffusion models have significantly helped improve the
quality of machine-generated drawings; however, a gap between humans and
machines remains -- in part explainable by discrepancies in visual strategies.
Related papers
- HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance [80.97360194728705]
AbHuman is the first large-scale synthesized human benchmark focusing on anatomical anomalies.
HumanRefiner is a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation.
arXiv Detail & Related papers (2024-07-09T15:14:41Z) - Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks [15.328499301244708]
We study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs)
We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings.
arXiv Detail & Related papers (2024-06-10T07:52:29Z) - Are Image Distributions Indistinguishable to Humans Indistinguishable to Classifiers? [39.31679737754048]
We show that, in the eyes of classifiers parameterized by neural networks, the strongest diffusion models are still far from this goal.
Our comprehensive empirical study suggests that, unlike humans, classifiers tend to classify images through edge and high-frequency components.
arXiv Detail & Related papers (2024-05-28T10:25:06Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - Intriguing properties of generative classifiers [14.57861413242093]
We build on advances in generative modeling that turn text-to-image models into classifiers.
They show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors.
Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well.
arXiv Detail & Related papers (2023-09-28T18:19:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.