Related papers: DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation

DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation

URL: http://arxiv.org/abs/2403.17664v1
Date: Tue, 26 Mar 2024 12:53:10 GMT
Title: DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation
Authors: Qilin Wang, Jiangning Zhang, Chengming Xu, Weijian Cao, Ying Tai, Yue Han, Yanhao Ge, Hong Gu, Chengjie Wang, Yanwei Fu,
Abstract summary: This paper presents DiffFAE, a one-stage and highly-efficient diffusion-based framework tailored for high-fidelity Facial Appearance Editing. For high-fidelity query attributes transfer, we adopt Space-sensitive Physical Customization (SPC), which ensures the fidelity and generalization ability. In order to preserve source attributes, we introduce the Region-responsive Semantic Composition (RSC) This module is guided to learn decoupled source-regarding features, thereby better preserving the identity and alleviating artifacts from non-facial attributes such as hair, clothes, and background.
Score: 84.0586749616249
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Facial Appearance Editing (FAE) aims to modify physical attributes, such as pose, expression and lighting, of human facial images while preserving attributes like identity and background, showing great importance in photograph. In spite of the great progress in this area, current researches generally meet three challenges: low generation fidelity, poor attribute preservation, and inefficient inference. To overcome above challenges, this paper presents DiffFAE, a one-stage and highly-efficient diffusion-based framework tailored for high-fidelity FAE. For high-fidelity query attributes transfer, we adopt Space-sensitive Physical Customization (SPC), which ensures the fidelity and generalization ability by utilizing rendering texture derived from 3D Morphable Model (3DMM). In order to preserve source attributes, we introduce the Region-responsive Semantic Composition (RSC). This module is guided to learn decoupled source-regarding features, thereby better preserving the identity and alleviating artifacts from non-facial attributes such as hair, clothes, and background. We further introduce a consistency regularization for our pipeline to enhance editing controllability by leveraging prior knowledge in the attention matrices of diffusion model. Extensive experiments demonstrate the superiority of DiffFAE over existing methods, achieving state-of-the-art performance in facial appearance editing.

Related papers

DiffSwap++: 3D Latent-Controlled Diffusion for Identity-Preserving Face Swapping [16.846179110602737]
We propose DiffSwap++, a novel diffusion-based face-swapping pipeline that incorporates 3D facial latent features during training.<n>Our method enhances geometric consistency and improves the disentanglement of facial identity from appearance attributes.<n>Experiments on CelebA, FFHQ, and CelebV-Text demonstrate that DiffSwap++ outperforms prior methods in preserving source identity while maintaining target pose and expression.
arXiv Detail & Related papers (2025-11-04T18:56:49Z)
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis [59.113899155476005]
The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation. We propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations. Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information.
arXiv Detail & Related papers (2025-03-19T20:50:10Z)
InstaFace: Identity-Preserving Facial Editing with Single Image Inference [13.067402877443902]
We introduce a novel diffusion-based framework, InstaFace, to generate realistic images while preserving identity using only a single image. InstaFace harnesses 3D perspectives by integrating multiple 3DMM-based conditionals without introducing additional trainable parameters. Our method outperforms several state-of-the-art approaches in terms of identity preservation, photorealism, and effective control of pose, expression, and lighting.
arXiv Detail & Related papers (2025-02-27T22:37:09Z)
Towards Consistent and Controllable Image Synthesis for Face Editing [18.646961062736207]
RigFace is a novel approach to control the lighting, facial expression and head pose of a portrait photo. Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models.
arXiv Detail & Related papers (2025-02-04T16:36:07Z)
HiFiVFS: High Fidelity Video Face Swapping [35.49571526968986]
Face swapping aims to generate results that combine the identity from the source with attributes from the target. We propose a high fidelity video face swapping framework, which leverages the strong generative capability and temporal prior of Stable Video Diffusion. Our method achieves state-of-the-art (SOTA) in video face swapping, both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-11-27T12:30:24Z)
ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition [60.15830516741776]
Synthetic face recognition (SFR) aims to generate datasets that mimic the distribution of real face data. We introduce a diffusion-fueled SFR model termed $textID3$. $textID3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances.
arXiv Detail & Related papers (2024-09-26T06:46:40Z)
A Reference-Based 3D Semantic-Aware Framework for Accurate Local Facial Attribute Editing [19.21301510545666]
We introduce a novel framework that merges latent-based and reference-based editing methods. Our approach employs a 3D GAN inversion technique to embed attributes from the reference image into a tri-plane space. A coarse-to-fine inpainting strategy is then applied to preserve the integrity of untargeted areas.
arXiv Detail & Related papers (2024-07-25T20:55:23Z)
Fiducial Focus Augmentation for Facial Landmark Detection [4.433764381081446]
We propose a novel image augmentation technique to enhance the model's understanding of facial structures. We employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss. Our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
arXiv Detail & Related papers (2024-02-23T01:34:00Z)
FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping [62.38898610210771]
We present a new single-stage method for subject face swapping and identity transfer, named FaceDancer. We have two major contributions: Adaptive Feature Fusion Attention (AFFA) and Interpreted Feature Similarity Regularization (IFSR)
arXiv Detail & Related papers (2022-10-19T11:31:38Z)
Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation [66.21121745446345]
We propose a conditional GNeRF model that integrates specific attribute labels as input, thus amplifying the controllability and disentanglement capabilities of 3D-aware generative models. Our approach builds upon a pre-trained 3D-aware face model, and we introduce a Training as Init and fidelity for Tuning (TRIOT) method to train a conditional normalized flow module. Our experiments substantiate the efficacy of our model, showcasing its ability to generate high-quality edits with enhanced view consistency.
arXiv Detail & Related papers (2022-08-26T10:05:39Z)
Deep Collaborative Multi-Modal Learning for Unsupervised Kinship Estimation [53.62256887837659]
Kinship verification is a long-standing research challenge in computer vision. We propose a novel deep collaborative multi-modal learning (DCML) to integrate the underlying information presented in facial properties. Our DCML method is always superior to some state-of-the-art kinship verification methods.
arXiv Detail & Related papers (2021-09-07T01:34:51Z)
DotFAN: A Domain-transferred Face Augmentation Network for Pose and Illumination Invariant Face Recognition [94.96686189033869]
We propose a 3D model-assisted domain-transferred face augmentation network (DotFAN) DotFAN can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains. Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity.
arXiv Detail & Related papers (2020-02-23T08:16:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.