ID-Booth: Identity-consistent Face Generation with Diffusion Models
- URL: http://arxiv.org/abs/2504.07392v3
- Date: Fri, 18 Apr 2025 23:24:31 GMT
- Title: ID-Booth: Identity-consistent Face Generation with Diffusion Models
- Authors: Darian Tomašević, Fadi Boutros, Chenhao Lin, Naser Damer, Vitomir Štruc, Peter Peer,
- Abstract summary: We present a novel generative diffusion-based framework called ID-Booth.<n>The framework enables identity-consistent image generation while retaining the synthesis capabilities of pretrained diffusion models.<n>Our method facilitates better intra-identity consistency and inter-identity separability than competing methods, while achieving higher image diversity.
- Score: 10.042492056152232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in generative modeling have enabled the generation of high-quality synthetic data that is applicable in a variety of domains, including face recognition. Here, state-of-the-art generative models typically rely on conditioning and fine-tuning of powerful pretrained diffusion models to facilitate the synthesis of realistic images of a desired identity. Yet, these models often do not consider the identity of subjects during training, leading to poor consistency between generated and intended identities. In contrast, methods that employ identity-based training objectives tend to overfit on various aspects of the identity, and in turn, lower the diversity of images that can be generated. To address these issues, we present in this paper a novel generative diffusion-based framework, called ID-Booth. ID-Booth consists of a denoising network responsible for data generation, a variational auto-encoder for mapping images to and from a lower-dimensional latent space and a text encoder that allows for prompt-based control over the generation procedure. The framework utilizes a novel triplet identity training objective and enables identity-consistent image generation while retaining the synthesis capabilities of pretrained diffusion models. Experiments with a state-of-the-art latent diffusion model and diverse prompts reveal that our method facilitates better intra-identity consistency and inter-identity separability than competing methods, while achieving higher image diversity. In turn, the produced data allows for effective augmentation of small-scale datasets and training of better-performing recognition models in a privacy-preserving manner. The source code for the ID-Booth framework is publicly available at https://github.com/dariant/ID-Booth.
Related papers
- Multi-focal Conditioned Latent Diffusion for Person Image Synthesis [59.113899155476005]
The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation.
We propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations.
Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information.
arXiv Detail & Related papers (2025-03-19T20:50:10Z) - UIFace: Unleashing Inherent Model Capabilities to Enhance Intra-Class Diversity in Synthetic Face Recognition [42.86969216015855]
Face recognition (FR) stands as one of the most crucial applications in computer vision.<n>We propose a framework to enhance intra-class diversity for synthetic face recognition, shortened as UIFace.<n> Experiments show that our method significantly surpasses previous approaches with even less training data and half the size of synthetic dataset.
arXiv Detail & Related papers (2025-02-27T06:22:18Z) - ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition [60.15830516741776]
Synthetic face recognition (SFR) aims to generate datasets that mimic the distribution of real face data.
We introduce a diffusion-fueled SFR model termed $textID3$.
$textID3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances.
arXiv Detail & Related papers (2024-09-26T06:46:40Z) - Generative Unlearning for Any Identity [6.872154067622779]
In certain domains related to privacy issues, advanced generative models along with strong inversion methods can lead to potential misuses.
We propose an essential yet under-explored task called generative identity unlearning, which steers the model not to generate an image of a specific identity.
We propose a novel framework, Generative Unlearning for Any Identity (GUIDE), which prevents the reconstruction of a specific identity by unlearning the generator with only a single image.
arXiv Detail & Related papers (2024-05-16T08:00:55Z) - InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation [0.0]
"InstantFamily" is an approach that employs a novel cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation.
Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions.
arXiv Detail & Related papers (2024-04-30T10:16:21Z) - LCM-Lookahead for Encoder-based Text-to-Image Personalization [82.56471486184252]
We explore the potential of using shortcut-mechanisms to guide the personalization of text-to-image models.
We focus on encoder-based personalization approaches, and demonstrate that by tuning them with a lookahead identity loss, we can achieve higher identity fidelity.
arXiv Detail & Related papers (2024-04-04T17:43:06Z) - Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization.
We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information.
We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved
Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation.
PortraitBooth eliminates computational overhead and mitigates identity distortion.
It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z) - DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven
Text-to-Image Generation [50.39533637201273]
We propose DisenBooth, an identity-preserving disentangled tuning framework for subject-driven text-to-image generation.
By combining the identity-preserved embedding and identity-irrelevant embedding, DisenBooth demonstrates more generation flexibility and controllability.
arXiv Detail & Related papers (2023-05-05T09:08:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.