Related papers: Diffusion Models as Artists: Are we Closing the Gap between Humans and Machines?

Diffusion Models as Artists: Are we Closing the Gap between Humans and Machines?

URL: http://arxiv.org/abs/2301.11722v3
Date: Wed, 31 May 2023 16:02:39 GMT
Title: Diffusion Models as Artists: Are we Closing the Gap between Humans and Machines?
Authors: Victor Boutin, Thomas Fel, Lakshya Singhal, Rishav Mukherji, Akash Nagaraj, Julien Colin and Thomas Serre
Abstract summary: We adapt the 'diversity vs. recognizability' scoring framework from Boutin et al, 2022. We find that one-shot diffusion models have indeed started to close the gap between humans and machines.
Score: 4.802758600019422
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An important milestone for AI is the development of algorithms that can produce drawings that are indistinguishable from those of humans. Here, we adapt the 'diversity vs. recognizability' scoring framework from Boutin et al, 2022 and find that one-shot diffusion models have indeed started to close the gap between humans and machines. However, using a finer-grained measure of the originality of individual samples, we show that strengthening the guidance of diffusion models helps improve the humanness of their drawings, but they still fall short of approximating the originality and recognizability of human drawings. Comparing human category diagnostic features, collected through an online psychophysics experiment, against those derived from diffusion models reveals that humans rely on fewer and more localized features. Overall, our study suggests that diffusion models have significantly helped improve the quality of machine-generated drawings; however, a gap between humans and machines remains -- in part explainable by discrepancies in visual strategies.

Related papers

DIRIGENt: End-To-End Robotic Imitation of Human Demonstrations Based on a Diffusion Model [16.26334759935617]
We introduce DIRIGENt, a novel end-to-end diffusion approach to generate joint values from observing human demonstrations. We create a dataset in which humans imitate a robot and then use this collected data to train a diffusion model that enables a robot to imitate humans.
arXiv Detail & Related papers (2025-01-28T09:05:03Z)
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts [61.274246025372044]
We study human-centric text-to-image generation in context of faces and hands. We propose a method called Mixture of Low-rank Experts (MoLE) by considering low-rank modules trained on close-up hand and face images respectively as experts. This concept draws inspiration from our observation of low-rank refinement, where a low-rank module trained by a customized close-up dataset has the potential to enhance the corresponding image part when applied at an appropriate scale.
arXiv Detail & Related papers (2024-10-30T17:59:57Z)
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance [80.97360194728705]
AbHuman is the first large-scale synthesized human benchmark focusing on anatomical anomalies. HumanRefiner is a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation.
arXiv Detail & Related papers (2024-07-09T15:14:41Z)
Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks [15.328499301244708]
We study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs) We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings.
arXiv Detail & Related papers (2024-06-10T07:52:29Z)
Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration. In this work, we tackle the task of reconstructing closely interactive humans from a monocular video. We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z)
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step. We first employ a human token to probe a human location in the image and encode global features for each instance. Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors. Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages. The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z)
Intriguing properties of generative classifiers [14.57861413242093]
We build on advances in generative modeling that turn text-to-image models into classifiers. They show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors. Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well.
arXiv Detail & Related papers (2023-09-28T18:19:40Z)
Imitating Human Behaviour with Diffusion Models [25.55215280101109]
Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments.
arXiv Detail & Related papers (2023-01-25T16:31:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.