Related papers: Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation

Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation

URL: http://arxiv.org/abs/2306.00914v3
Date: Wed, 27 Sep 2023 18:13:12 GMT
Title: Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation
Authors: Nico Giambi and Giuseppe Lisanti
Abstract summary: Deep generative models have shown impressive results in generating realistic images of faces. GANs managed to generate high-quality, high-fidelity images when conditioned on semantic masks, but they still lack the ability to diversify their output. We propose a multi-conditioning approach for diffusion models via cross-attention exploiting both attributes and semantic masks to generate high-quality and controllable face images.
Score: 1.104121146441257
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep generative models have shown impressive results in generating realistic images of faces. GANs managed to generate high-quality, high-fidelity images when conditioned on semantic masks, but they still lack the ability to diversify their output. Diffusion models partially solve this problem and are able to generate diverse samples given the same condition. In this paper, we propose a multi-conditioning approach for diffusion models via cross-attention exploiting both attributes and semantic masks to generate high-quality and controllable face images. We also studied the impact of applying perceptual-focused loss weighting into the latent space instead of the pixel space. Our method extends the previous approaches by introducing conditioning on more than one set of features, guaranteeing a more fine-grained control over the generated face images. We evaluate our approach on the CelebA-HQ dataset, and we show that it can generate realistic and diverse samples while allowing for fine-grained control over multiple attributes and semantic regions. Additionally, we perform an ablation study to evaluate the impact of different conditioning strategies on the quality and diversity of the generated images.

Related papers

Multi-focal Conditioned Latent Diffusion for Person Image Synthesis [59.113899155476005]
The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation. We propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations. Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information.
arXiv Detail & Related papers (2025-03-19T20:50:10Z)
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers [86.5541501589166]
DiffMoE is a batch-level global token pool that enables experts to access global token distributions during training. It achieves state-of-the-art performance among diffusion models on ImageNet benchmark. The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation.
arXiv Detail & Related papers (2025-03-18T17:57:07Z)
Diffusion Prism: Enhancing Diversity and Morphology Consistency in Mask-to-Image Diffusion [4.0301593672451]
Diffusion Prism is a training-free framework that transforms binary masks into realistic and diverse samples. We explore that a small amount of artificial noise will significantly assist the image-denoising process.
arXiv Detail & Related papers (2025-01-01T20:04:25Z)
A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks. Our approach enables versatile capabilities via different inference-time sampling schemes. Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z)
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL. We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution. Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z)
MCGM: Mask Conditional Text-to-Image Generative Model [1.909929271850469]
We propose a novel Conditional Mask Text-to-Image Generative Model (MCGM) Our model builds upon the success of the Break-a-scene [1] model in generating new scenes using a single image with multiple subjects. By introducing this additional level of control, MCGM offers a flexible and intuitive approach for generating specific poses for one or more subjects learned from a single image.
arXiv Detail & Related papers (2024-10-01T08:13:47Z)
DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis [71.40724659748787]
DiffusionFace is the first diffusion-based face forgery dataset. It covers various forgery categories, including unconditional and Text Guide facial image generation, Img2Img, Inpaint, and Diffusion-based facial exchange algorithms. It provides essential metadata and a real-world internet-sourced forgery facial image dataset for evaluation.
arXiv Detail & Related papers (2024-03-27T11:32:44Z)
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space. Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings. The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z)
Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior. Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs) We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z)
Controllable Inversion of Black-Box Face Recognition Models via Diffusion [8.620807177029892]
We tackle the task of inverting the latent space of pre-trained face recognition models without full model access. We show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution. Our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process.
arXiv Detail & Related papers (2023-03-23T03:02:09Z)
Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation. It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression. Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z)
On Conditioning the Input Noise for Controlled Image Generation with Diffusion Models [27.472482893004862]
Conditional image generation has paved the way for several breakthroughs in image editing, generating stock photos and 3-D object generation. In this work, we explore techniques to condition diffusion models with carefully crafted input noise artifacts.
arXiv Detail & Related papers (2022-05-08T13:18:14Z)
Cluster-guided Image Synthesis with Unconditional Models [41.89334167530054]
This work focuses on controllable image generation by leveraging GANs that are well-trained in an unsupervised fashion. By conditioning on the cluster assignments, the proposed method is able to control the semantic class of the generated image. We showcase the efficacy of our approach on faces (CelebA-HQ and FFHQ), animals (Imagenet) and objects (LSUN) using different pre-trained generative models.
arXiv Detail & Related papers (2021-12-24T02:18:34Z)
High Resolution Face Editing with Masked GAN Latent Code Optimization [0.0]
Face editing is a popular research topic in the computer vision community. Recent proposed methods are based on either training a conditional encoder-decoder Generative Adversarial Network (GAN) in an end-to-end fashion or on defining an operation in the latent space of a pre-trained vanilla GAN generator model. We propose a GAN embedding optimization procedure with spatial and semantic constraints.
arXiv Detail & Related papers (2021-03-20T08:39:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.