Revealing Directions for Text-guided 3D Face Editing
- URL: http://arxiv.org/abs/2410.04965v1
- Date: Mon, 7 Oct 2024 12:04:39 GMT
- Title: Revealing Directions for Text-guided 3D Face Editing
- Authors: Zhuo Chen, Yichao Yan, Sehngqi Liu, Yuhao Cheng, Weiming Zhao, Lincheng Li, Mengxiao Bi, Xiaokang Yang,
- Abstract summary: 3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals.
We present Face Clan, a text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions.
Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description.
- Score: 52.85632020601518
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals. The success of 3D-aware GAN provides expressive 3D models learned from 2D single-view images only, encouraging researchers to discover semantic editing directions in its latent space. However, previous methods face challenges in balancing quality, efficiency, and generalization. To solve the problem, we explore the possibility of introducing the strength of diffusion model into 3D-aware GANs. In this paper, we present Face Clan, a fast and text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions. To achieve disentangled editing, we propose to diffuse on the latent space under a pair of opposite prompts to estimate the mask indicating the region of interest on latent codes. Based on the mask, we then apply denoising to the masked latent codes to reveal the editing direction. Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description. Experiments demonstrate the effectiveness and generalization of our Face Clan for various pre-trained GANs. It offers an intuitive and wide application for text-guided face editing that contributes to the landscape of multimedia content creation.
Related papers
- Reference-Based 3D-Aware Image Editing with Triplanes [15.222454412573455]
Generative Adversarial Networks (GANs) have emerged as powerful tools for high-quality image generation and real image editing by manipulating their latent spaces.
Recent advancements in GANs include 3D-aware models such as EG3D, which feature efficient triplane-based architectures capable of reconstructing 3D geometry from single images.
This study addresses this gap by exploring and demonstrating the effectiveness of the triplane space for advanced reference-based edits.
arXiv Detail & Related papers (2024-04-04T17:53:33Z) - SERF: Fine-Grained Interactive 3D Segmentation and Editing with Radiance Fields [92.14328581392633]
We introduce a novel fine-grained interactive 3D segmentation and editing algorithm with radiance fields, which we refer to as SERF.
Our method entails creating a neural mesh representation by integrating multi-view algorithms with pre-trained 2D models.
Building upon this representation, we introduce a novel surface rendering technique that preserves local information and is robust to deformation.
arXiv Detail & Related papers (2023-12-26T02:50:42Z) - MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing [61.014328598895524]
We propose textbfMaTe3D: mask-guided text-based 3D-aware portrait editing.
New SDF-based 3D generator learns local and global representations with proposed SDF and density consistency losses.
Conditional Distillation on Geometry and Texture (CDGT) mitigates visual ambiguity and avoids mismatch between texture and geometry.
arXiv Detail & Related papers (2023-12-12T03:04:08Z) - Control3D: Towards Controllable Text-to-3D Generation [107.81136630589263]
We present a text-to-3D generation conditioning on the additional hand-drawn sketch, namely Control3D.
A 2D conditioned diffusion model (ControlNet) is remoulded to guide the learning of 3D scene parameterized as NeRF.
We exploit a pre-trained differentiable photo-to-sketch model to directly estimate the sketch of the rendered image over synthetic 3D scene.
arXiv Detail & Related papers (2023-11-09T15:50:32Z) - InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image [25.076270175205593]
InstructPix2NeRF enables instructed 3D-aware portrait editing from a single open-world image with human instructions.
At its core lies a conditional latent 3D diffusion process that lifts 2D editing to 3D space by learning the correlation between the paired images' difference and the instructions via triplet data.
arXiv Detail & Related papers (2023-11-06T02:21:11Z) - FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural
Radiance Fields [39.57313951313061]
Existing manipulation methods require extensive human labor.
Our approach is designed to require a single text to manipulate a face reconstructed with NeRF.
Our approach is the first to address the text-driven manipulation of a face reconstructed with NeRF.
arXiv Detail & Related papers (2023-07-21T08:22:14Z) - Single-Shot Implicit Morphable Faces with Consistent Texture
Parameterization [91.52882218901627]
We propose a novel method for constructing implicit 3D morphable face models that are both generalizable and intuitive for editing.
Our method improves upon photo-realism, geometry, and expression accuracy compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T17:58:40Z) - 3D-FM GAN: Towards 3D-Controllable Face Manipulation [43.99393180444706]
3D-FM GAN is a novel conditional GAN framework designed specifically for 3D-controllable face manipulation.
By carefully encoding both the input face image and a physically-based rendering of 3D edits into a StyleGAN's latent spaces, our image generator provides high-quality, identity-preserved, 3D-controllable face manipulation.
We show that our method outperforms the prior arts on various tasks, with better editability, stronger identity preservation, and higher photo-realism.
arXiv Detail & Related papers (2022-08-24T01:33:13Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.