Diffusion Models as Artists: Are we Closing the Gap between Humans and
Machines?
- URL: http://arxiv.org/abs/2301.11722v3
- Date: Wed, 31 May 2023 16:02:39 GMT
- Title: Diffusion Models as Artists: Are we Closing the Gap between Humans and
Machines?
- Authors: Victor Boutin, Thomas Fel, Lakshya Singhal, Rishav Mukherji, Akash
Nagaraj, Julien Colin and Thomas Serre
- Abstract summary: We adapt the 'diversity vs. recognizability' scoring framework from Boutin et al, 2022.
We find that one-shot diffusion models have indeed started to close the gap between humans and machines.
- Score: 4.802758600019422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An important milestone for AI is the development of algorithms that can
produce drawings that are indistinguishable from those of humans. Here, we
adapt the 'diversity vs. recognizability' scoring framework from Boutin et al,
2022 and find that one-shot diffusion models have indeed started to close the
gap between humans and machines. However, using a finer-grained measure of the
originality of individual samples, we show that strengthening the guidance of
diffusion models helps improve the humanness of their drawings, but they still
fall short of approximating the originality and recognizability of human
drawings. Comparing human category diagnostic features, collected through an
online psychophysics experiment, against those derived from diffusion models
reveals that humans rely on fewer and more localized features. Overall, our
study suggests that diffusion models have significantly helped improve the
quality of machine-generated drawings; however, a gap between humans and
machines remains -- in part explainable by discrepancies in visual strategies.
Related papers
- MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts [61.274246025372044]
We study human-centric text-to-image generation in context of faces and hands.
We propose a method called Mixture of Low-rank Experts (MoLE) by considering low-rank modules trained on close-up hand and face images respectively as experts.
This concept draws inspiration from our observation of low-rank refinement, where a low-rank module trained by a customized close-up dataset has the potential to enhance the corresponding image part when applied at an appropriate scale.
arXiv Detail & Related papers (2024-10-30T17:59:57Z) - HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance [80.97360194728705]
AbHuman is the first large-scale synthesized human benchmark focusing on anatomical anomalies.
HumanRefiner is a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation.
arXiv Detail & Related papers (2024-07-09T15:14:41Z) - Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks [15.328499301244708]
We study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs)
We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings.
arXiv Detail & Related papers (2024-06-10T07:52:29Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - Intriguing properties of generative classifiers [14.57861413242093]
We build on advances in generative modeling that turn text-to-image models into classifiers.
They show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors.
Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well.
arXiv Detail & Related papers (2023-09-28T18:19:40Z) - Imitating Human Behaviour with Diffusion Models [25.55215280101109]
Diffusion models have emerged as powerful generative models in the text-to-image domain.
This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments.
arXiv Detail & Related papers (2023-01-25T16:31:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.