DisControlFace: Adding Disentangled Control to Diffusion Autoencoder for One-shot Explicit Facial Image Editing
- URL: http://arxiv.org/abs/2312.06193v2
- Date: Wed, 24 Jul 2024 07:21:35 GMT
- Title: DisControlFace: Adding Disentangled Control to Diffusion Autoencoder for One-shot Explicit Facial Image Editing
- Authors: Haozhe Jia, Yan Li, Hengfei Cui, Di Xu, Yuwang Wang, Tao Yu,
- Abstract summary: We focus on exploring explicit fine-grained control of generative facial image editing.
We propose a novel diffusion-based editing framework, named DisControlFace.
Our model can be trained using 2D in-the-wild portrait images without requiring 3D or video data.
- Score: 14.537856326925178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we focus on exploring explicit fine-grained control of generative facial image editing, all while generating faithful facial appearances and consistent semantic details, which however, is quite challenging and has not been extensively explored, especially under an one-shot scenario. We identify the key challenge as the exploration of disentangled conditional control between high-level semantics and explicit parameters (e.g., 3DMM) in the generation process, and accordingly propose a novel diffusion-based editing framework, named DisControlFace. Specifically, we leverage a Diffusion Autoencoder (Diff-AE) as the semantic reconstruction backbone. To enable explicit face editing, we construct an Exp-FaceNet that is compatible with Diff-AE to generate spatial-wise explicit control conditions based on estimated 3DMM parameters. Different from current diffusion-based editing methods that train the whole conditional generative model from scratch, we freeze the pre-trained weights of the Diff-AE to maintain its semantically deterministic conditioning capability and accordingly propose a random semantic masking (RSM) strategy to effectively achieve an independent training of Exp-FaceNet. This setting endows the model with disentangled face control meanwhile reducing semantic information shift in editing. Our model can be trained using 2D in-the-wild portrait images without requiring 3D or video data and perform robust editing on any new facial image through a simple one-shot fine-tuning. Comprehensive experiments demonstrate that DisControlFace can generate realistic facial images with better editing accuracy and identity preservation over state-of-the-art methods. Project page: https://discontrolface.github.io/
Related papers
- DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models [79.0135981840682]
We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to enable precise inversion for discrete diffusion models.
By recording noise sequences and masking patterns during the reverse diffusion process, DICE enables accurate reconstruction and flexible editing of discrete data.
Our results show that DICE preserves high data fidelity while enhancing editing capabilities, offering new opportunities for fine-grained content manipulation in discrete spaces.
arXiv Detail & Related papers (2024-10-10T17:59:48Z) - Revealing Directions for Text-guided 3D Face Editing [52.85632020601518]
3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals.
We present Face Clan, a text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions.
Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description.
arXiv Detail & Related papers (2024-10-07T12:04:39Z) - PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control [24.569528214869113]
StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation.
This work uses the disentangled $mathcalW+$ space of StyleGANs to condition the T2I model.
We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing.
arXiv Detail & Related papers (2024-07-24T07:10:25Z) - Controllable Talking Face Generation by Implicit Facial Keypoints Editing [6.036277153327655]
We present ControlTalk, a talking face generation method to control face expression deformation based on driven audio.
Our experiments show that our method is superior to state-of-the-art performance on widely used benchmarks, including HDTF and MEAD.
arXiv Detail & Related papers (2024-06-05T02:54:46Z) - Controllable Face Manipulation and UV Map Generation by Self-supervised
Learning [20.10160338724354]
Recent methods achieve explicit control over 2D images by combining 2D generative model and 3DMM.
Due to the lack of realism and clarity in texture reconstruction by 3DMM, there is a domain gap between the synthetic image and the rendered image of 3DMM.
In this study, we propose to explicitly edit the latent space of the pretrained StyleGAN by controlling the parameters of the 3DMM.
arXiv Detail & Related papers (2022-09-24T16:49:25Z) - IA-FaceS: A Bidirectional Method for Semantic Face Editing [8.19063619210761]
This paper proposes a bidirectional method for disentangled face attribute manipulation as well as flexible, controllable component editing.
IA-FaceS is developed for the first time without any input visual guidance, such as segmentation masks or sketches.
Both quantitative and qualitative results indicate that the proposed method outperforms the other techniques in reconstruction, face attribute manipulation, and component transfer.
arXiv Detail & Related papers (2022-03-24T14:44:56Z) - PIRenderer: Controllable Portrait Image Generation via Semantic Neural
Rendering [56.762094966235566]
A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models.
The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications.
Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
arXiv Detail & Related papers (2021-09-17T07:24:16Z) - Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo
Collection [65.92058628082322]
Non-parametric face modeling aims to reconstruct 3D face only from images without shape assumptions.
This paper presents a novel Learning to Aggregate and Personalize framework for unsupervised robust 3D face modeling.
arXiv Detail & Related papers (2021-06-15T03:10:17Z) - FaceController: Controllable Attribute Editing for Face in the Wild [74.56117807309576]
We propose a simple feed-forward network to generate high-fidelity manipulated faces.
By simply employing some existing and easy-obtainable prior information, our method can control, transfer, and edit diverse attributes of faces in the wild.
In our method, we decouple identity, expression, pose, and illumination using 3D priors; separate texture and colors by using region-wise style codes.
arXiv Detail & Related papers (2021-02-23T02:47:28Z) - PIE: Portrait Image Embedding for Semantic Control [82.69061225574774]
We present the first approach for embedding real portrait images in the latent space of StyleGAN.
We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN.
An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
arXiv Detail & Related papers (2020-09-20T17:53:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.