Conditioning Diffusion Models via Attributes and Semantic Masks for Face
Generation
- URL: http://arxiv.org/abs/2306.00914v3
- Date: Wed, 27 Sep 2023 18:13:12 GMT
- Title: Conditioning Diffusion Models via Attributes and Semantic Masks for Face
Generation
- Authors: Nico Giambi and Giuseppe Lisanti
- Abstract summary: Deep generative models have shown impressive results in generating realistic images of faces.
GANs managed to generate high-quality, high-fidelity images when conditioned on semantic masks, but they still lack the ability to diversify their output.
We propose a multi-conditioning approach for diffusion models via cross-attention exploiting both attributes and semantic masks to generate high-quality and controllable face images.
- Score: 1.104121146441257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models have shown impressive results in generating realistic
images of faces. GANs managed to generate high-quality, high-fidelity images
when conditioned on semantic masks, but they still lack the ability to
diversify their output. Diffusion models partially solve this problem and are
able to generate diverse samples given the same condition. In this paper, we
propose a multi-conditioning approach for diffusion models via cross-attention
exploiting both attributes and semantic masks to generate high-quality and
controllable face images. We also studied the impact of applying
perceptual-focused loss weighting into the latent space instead of the pixel
space. Our method extends the previous approaches by introducing conditioning
on more than one set of features, guaranteeing a more fine-grained control over
the generated face images. We evaluate our approach on the CelebA-HQ dataset,
and we show that it can generate realistic and diverse samples while allowing
for fine-grained control over multiple attributes and semantic regions.
Additionally, we perform an ablation study to evaluate the impact of different
conditioning strategies on the quality and diversity of the generated images.
Related papers
- A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL.
We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution.
Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z) - MCGM: Mask Conditional Text-to-Image Generative Model [1.909929271850469]
We propose a novel Conditional Mask Text-to-Image Generative Model (MCGM)
Our model builds upon the success of the Break-a-scene [1] model in generating new scenes using a single image with multiple subjects.
By introducing this additional level of control, MCGM offers a flexible and intuitive approach for generating specific poses for one or more subjects learned from a single image.
arXiv Detail & Related papers (2024-10-01T08:13:47Z) - DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis [71.40724659748787]
DiffusionFace is the first diffusion-based face forgery dataset.
It covers various forgery categories, including unconditional and Text Guide facial image generation, Img2Img, Inpaint, and Diffusion-based facial exchange algorithms.
It provides essential metadata and a real-world internet-sourced forgery facial image dataset for evaluation.
arXiv Detail & Related papers (2024-03-27T11:32:44Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior.
Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs)
We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z) - Controllable Inversion of Black-Box Face Recognition Models via
Diffusion [8.620807177029892]
We tackle the task of inverting the latent space of pre-trained face recognition models without full model access.
We show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution.
Our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process.
arXiv Detail & Related papers (2023-03-23T03:02:09Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z) - On Conditioning the Input Noise for Controlled Image Generation with
Diffusion Models [27.472482893004862]
Conditional image generation has paved the way for several breakthroughs in image editing, generating stock photos and 3-D object generation.
In this work, we explore techniques to condition diffusion models with carefully crafted input noise artifacts.
arXiv Detail & Related papers (2022-05-08T13:18:14Z) - Cluster-guided Image Synthesis with Unconditional Models [41.89334167530054]
This work focuses on controllable image generation by leveraging GANs that are well-trained in an unsupervised fashion.
By conditioning on the cluster assignments, the proposed method is able to control the semantic class of the generated image.
We showcase the efficacy of our approach on faces (CelebA-HQ and FFHQ), animals (Imagenet) and objects (LSUN) using different pre-trained generative models.
arXiv Detail & Related papers (2021-12-24T02:18:34Z) - High Resolution Face Editing with Masked GAN Latent Code Optimization [0.0]
Face editing is a popular research topic in the computer vision community.
Recent proposed methods are based on either training a conditional encoder-decoder Generative Adversarial Network (GAN) in an end-to-end fashion or on defining an operation in the latent space of a pre-trained vanilla GAN generator model.
We propose a GAN embedding optimization procedure with spatial and semantic constraints.
arXiv Detail & Related papers (2021-03-20T08:39:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.