Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks
- URL: http://arxiv.org/abs/2406.06079v2
- Date: Tue, 05 Nov 2024 09:07:21 GMT
- Title: Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks
- Authors: Victor Boutin, Rishav Mukherji, Aditya Agrawal, Sabine Muzellec, Thomas Fel, Thomas Serre, Rufin VanRullen,
- Abstract summary: We study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs)
We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings.
- Score: 15.328499301244708
- License:
- Abstract: Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed.
Related papers
- Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling [47.82616476928464]
Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data.
We show that both training and sampling of MDMs are theoretically free from the time variable.
We identify, for the first time, an underlying numerical issue, even with the commonly used 32-bit floating-point precision.
arXiv Detail & Related papers (2024-09-04T17:48:19Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - Multimodal Composite Association Score: Measuring Gender Bias in
Generative Multimodal Models [6.369985818712948]
Multimodal Composite Association Score (MCAS) is a new method of measuring gender bias in multimodal generative models.
MCAS is an accessible and scalable method of quantifying potential bias for models with different modalities and a range of potential biases.
arXiv Detail & Related papers (2023-04-26T22:53:31Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - Diffusion Models as Artists: Are we Closing the Gap between Humans and
Machines? [4.802758600019422]
We adapt the 'diversity vs. recognizability' scoring framework from Boutin et al, 2022.
We find that one-shot diffusion models have indeed started to close the gap between humans and machines.
arXiv Detail & Related papers (2023-01-27T14:08:15Z) - Imitating Human Behaviour with Diffusion Models [25.55215280101109]
Diffusion models have emerged as powerful generative models in the text-to-image domain.
This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments.
arXiv Detail & Related papers (2023-01-25T16:31:05Z) - Drawing out of Distribution with Neuro-Symbolic Generative Models [49.79371715591122]
Drawing out of Distribution is a neuro-symbolic generative model of stroke-based drawing.
DooD operates directly on images, requires no supervision or expensive test-time inference.
We evaluate DooD on its ability to generalise across both data and tasks.
arXiv Detail & Related papers (2022-06-03T21:40:22Z) - Diversity vs. Recognizability: Human-like generalization in one-shot
generative models [5.964436882344729]
We propose a new framework to evaluate one-shot generative models along two axes: sample recognizability vs. diversity.
We first show that GAN-like and VAE-like models fall on opposite ends of the diversity-recognizability space.
In contrast, disentanglement transports the model along a parabolic curve that could be used to maximize recognizability.
arXiv Detail & Related papers (2022-05-20T13:17:08Z) - Entropy-Based Uncertainty Calibration for Generalized Zero-Shot Learning [49.04790688256481]
The goal of generalized zero-shot learning (GZSL) is to recognise both seen and unseen classes.
Most GZSL methods typically learn to synthesise visual representations from semantic information on the unseen classes.
We propose a novel framework that leverages dual variational autoencoders with a triplet loss to learn discriminative latent features.
arXiv Detail & Related papers (2021-01-09T05:21:27Z) - Appearance Consensus Driven Self-Supervised Human Mesh Recovery [67.20942777949793]
We present a self-supervised human mesh recovery framework to infer human pose and shape from monocular images.
We achieve state-of-the-art results on the standard model-based 3D pose estimation benchmarks.
The resulting colored mesh prediction opens up the usage of our framework for a variety of appearance-related tasks beyond the pose and shape estimation.
arXiv Detail & Related papers (2020-08-04T05:40:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.