Towards Consistent and Controllable Image Synthesis for Face Editing
- URL: http://arxiv.org/abs/2502.02465v2
- Date: Sun, 09 Feb 2025 14:09:32 GMT
- Title: Towards Consistent and Controllable Image Synthesis for Face Editing
- Authors: Mengting Wei, Tuomas Varanka, Yante Li, Xingxun Jiang, Huai-Qian Khor, Guoying Zhao,
- Abstract summary: RigFace is a novel approach to control the lighting, facial expression and head pose of a portrait photo.<n>Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models.
- Score: 18.646961062736207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face editing methods, essential for tasks like virtual avatars, digital human synthesis and identity preservation, have traditionally been built upon GAN-based techniques, while recent focus has shifted to diffusion-based models due to their success in image reconstruction. However, diffusion models still face challenges in controlling specific attributes and preserving the consistency of other unchanged attributes especially the identity characteristics. To address these issues and facilitate more convenient editing of face images, we propose a novel approach that leverages the power of Stable-Diffusion (SD) models and crude 3D face models to control the lighting, facial expression and head pose of a portrait photo. We observe that this task essentially involves the combinations of target background, identity and face attributes aimed to edit. We strive to sufficiently disentangle the control of these factors to enable consistency of face editing. Specifically, our method, coined as RigFace, contains: 1) A Spatial Attribute Encoder that provides presise and decoupled conditions of background, pose, expression and lighting; 2) A high-consistency FaceFusion method that transfers identity features from the Identity Encoder to the denoising UNet of a pre-trained SD model; 3) An Attribute Rigger that injects those conditions into the denoising UNet. Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models. Code is publicly available at https://github.com/weimengting/RigFace.
Related papers
- InstaFace: Identity-Preserving Facial Editing with Single Image Inference [13.067402877443902]
We introduce a novel diffusion-based framework, InstaFace, to generate realistic images while preserving identity using only a single image.
InstaFace harnesses 3D perspectives by integrating multiple 3DMM-based conditionals without introducing additional trainable parameters.
Our method outperforms several state-of-the-art approaches in terms of identity preservation, photorealism, and effective control of pose, expression, and lighting.
arXiv Detail & Related papers (2025-02-27T22:37:09Z) - MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control [17.86535640560411]
We address the problem of facial expression editing by controling the relative variation of facial action-unit (AU) from the same person.
This enables us to edit this specific person's expression in a fine-grained, continuous and interpretable manner.
Key to our model, which we dub MagicFace, is a diffusion model conditioned on AU variations and an ID encoder.
arXiv Detail & Related papers (2025-01-04T11:28:49Z) - HiFiVFS: High Fidelity Video Face Swapping [35.49571526968986]
Face swapping aims to generate results that combine the identity from the source with attributes from the target.
We propose a high fidelity video face swapping framework, which leverages the strong generative capability and temporal prior of Stable Video Diffusion.
Our method achieves state-of-the-art (SOTA) in video face swapping, both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-11-27T12:30:24Z) - OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping.
We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping.
Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z) - Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control [59.954322727683746]
Face-Adapter is designed for high-precision and high-fidelity face editing for pre-trained diffusion models.
Face-Adapter achieves comparable or even superior performance in terms of motion control precision, ID retention capability, and generation quality.
arXiv Detail & Related papers (2024-05-21T17:50:12Z) - Effective Adapter for Face Recognition in the Wild [72.75516495170199]
We tackle the challenge of face recognition in the wild, where images often suffer from low quality and real-world distortions.
Traditional approaches-either training models directly on degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective.
We propose an effective adapter for augmenting existing face recognition models trained on high-quality facial datasets.
arXiv Detail & Related papers (2023-12-04T08:55:46Z) - HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and
Retarget Faces [47.27033282706179]
We present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity.
Our method operates under the one-shot setting (i.e., using a single source frame) and allows for cross-subject reenactment, without requiring subject-specific fine-tuning.
We compare our method both quantitatively and qualitatively against several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2.
arXiv Detail & Related papers (2023-07-20T11:59:42Z) - DreamIdentity: Improved Editability for Efficient Face-identity
Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces.
We also propose self-augmented editability learning to enhance the editability of models.
Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z) - Controllable Inversion of Black-Box Face Recognition Models via
Diffusion [8.620807177029892]
We tackle the task of inverting the latent space of pre-trained face recognition models without full model access.
We show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution.
Our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process.
arXiv Detail & Related papers (2023-03-23T03:02:09Z) - DiffFace: Diffusion-based Face Swapping with Facial Guidance [24.50570533781642]
We propose a diffusion-based face swapping framework for the first time, called DiffFace.
It is composed of training ID conditional DDPM, sampling with facial guidance, and a target-preserving blending.
DiffFace achieves better benefits such as training stability, high fidelity, diversity of the samples, and controllability.
arXiv Detail & Related papers (2022-12-27T02:51:46Z) - Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo
Collection [65.92058628082322]
Non-parametric face modeling aims to reconstruct 3D face only from images without shape assumptions.
This paper presents a novel Learning to Aggregate and Personalize framework for unsupervised robust 3D face modeling.
arXiv Detail & Related papers (2021-06-15T03:10:17Z) - Pixel Sampling for Style Preserving Face Pose Editing [53.14006941396712]
We present a novel two-stage approach to solve the dilemma, where the task of face pose manipulation is cast into face inpainting.
By selectively sampling pixels from the input face and slightly adjust their relative locations, the face editing result faithfully keeps the identity information as well as the image style unchanged.
With the 3D facial landmarks as guidance, our method is able to manipulate face pose in three degrees of freedom, i.e., yaw, pitch, and roll, resulting in more flexible face pose editing.
arXiv Detail & Related papers (2021-06-14T11:29:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.