MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control
- URL: http://arxiv.org/abs/2501.02260v2
- Date: Thu, 09 Jan 2025 06:14:09 GMT
- Title: MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control
- Authors: Mengting Wei, Tuomas Varanka, Xingxun Jiang, Huai-Qian Khor, Guoying Zhao,
- Abstract summary: We address the problem of facial expression editing by controling the relative variation of facial action-unit (AU) from the same person.
This enables us to edit this specific person's expression in a fine-grained, continuous and interpretable manner.
Key to our model, which we dub MagicFace, is a diffusion model conditioned on AU variations and an ID encoder.
- Score: 17.86535640560411
- License:
- Abstract: We address the problem of facial expression editing by controling the relative variation of facial action-unit (AU) from the same person. This enables us to edit this specific person's expression in a fine-grained, continuous and interpretable manner, while preserving their identity, pose, background and detailed facial attributes. Key to our model, which we dub MagicFace, is a diffusion model conditioned on AU variations and an ID encoder to preserve facial details of high consistency. Specifically, to preserve the facial details with the input identity, we leverage the power of pretrained Stable-Diffusion models and design an ID encoder to merge appearance features through self-attention. To keep background and pose consistency, we introduce an efficient Attribute Controller by explicitly informing the model of current background and pose of the target. By injecting AU variations into a denoising UNet, our model can animate arbitrary identities with various AU combinations, yielding superior results in high-fidelity expression editing compared to other facial expression editing works. Code is publicly available at https://github.com/weimengting/MagicFace.
Related papers
- Towards Consistent and Controllable Image Synthesis for Face Editing [18.646961062736207]
RigFace is a novel approach to control the lighting, facial expression and head pose of a portrait photo.
Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models.
arXiv Detail & Related papers (2025-02-04T16:36:07Z) - Turn That Frown Upside Down: FaceID Customization via Cross-Training Data [49.51940625552275]
CrossFaceID is the first large-scale, high-quality, and publicly available dataset designed to improve the facial modification capabilities of FaceID customization models.
It consists of 40,000 text-image pairs from approximately 2,000 persons, with each person represented by around 20 images showcasing diverse facial attributes.
During the training stage, a specific face of a person is used as input, and the FaceID customization model is forced to generate another image of the same person but with altered facial features.
Experiments show that models fine-tuned on the CrossFaceID dataset its performance in preserving FaceID fidelity while significantly improving its
arXiv Detail & Related papers (2025-01-26T05:27:38Z) - WEM-GAN: Wavelet transform based facial expression manipulation [2.0918868193463207]
We propose WEM-GAN, in short for wavelet-based expression manipulation GAN.
We take advantage of the wavelet transform technique and combine it with our generator with a U-net autoencoder backbone.
Our model performs better in preserving identity features, editing capability, and image generation quality on the AffectNet dataset.
arXiv Detail & Related papers (2024-12-03T16:23:02Z) - Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping.
We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping.
Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z) - Arc2Face: A Foundation Model for ID-Consistent Human Faces [95.00331107591859]
Arc2Face is an identity-conditioned face foundation model.
It can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models.
arXiv Detail & Related papers (2024-03-18T10:32:51Z) - Beyond Inserting: Learning Identity Embedding for Semantic-Fidelity Personalized Diffusion Generation [21.739328335601716]
This paper focuses on inserting accurate and interactive ID embedding into the Stable Diffusion Model for personalized generation.
We propose a face-wise attention loss to fit the face region instead of entangling ID-unrelated information, such as face layout and background.
Our results exhibit superior ID accuracy, text-based manipulation ability, and generalization compared to previous methods.
arXiv Detail & Related papers (2024-01-31T11:52:33Z) - DisControlFace: Adding Disentangled Control to Diffusion Autoencoder for One-shot Explicit Facial Image Editing [14.537856326925178]
We focus on exploring explicit fine-grained control of generative facial image editing.
We propose a novel diffusion-based editing framework, named DisControlFace.
Our model can be trained using 2D in-the-wild portrait images without requiring 3D or video data.
arXiv Detail & Related papers (2023-12-11T08:16:55Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - DiffFace: Diffusion-based Face Swapping with Facial Guidance [24.50570533781642]
We propose a diffusion-based face swapping framework for the first time, called DiffFace.
It is composed of training ID conditional DDPM, sampling with facial guidance, and a target-preserving blending.
DiffFace achieves better benefits such as training stability, high fidelity, diversity of the samples, and controllability.
arXiv Detail & Related papers (2022-12-27T02:51:46Z) - Learning Disentangled Representation for One-shot Progressive Face Swapping [92.09538942684539]
We present a simple yet efficient method named FaceSwapper, for one-shot face swapping based on Generative Adversarial Networks.
Our method consists of a disentangled representation module and a semantic-guided fusion module.
Our method achieves state-of-the-art results on benchmark datasets with fewer training samples.
arXiv Detail & Related papers (2022-03-24T11:19:04Z) - FaceController: Controllable Attribute Editing for Face in the Wild [74.56117807309576]
We propose a simple feed-forward network to generate high-fidelity manipulated faces.
By simply employing some existing and easy-obtainable prior information, our method can control, transfer, and edit diverse attributes of faces in the wild.
In our method, we decouple identity, expression, pose, and illumination using 3D priors; separate texture and colors by using region-wise style codes.
arXiv Detail & Related papers (2021-02-23T02:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.