MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing
- URL: http://arxiv.org/abs/2312.06947v4
- Date: Fri, 5 Jul 2024 13:08:10 GMT
- Title: MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing
- Authors: Kangneng Zhou, Daiheng Gao, Xuan Wang, Jie Zhang, Peng Zhang, Xusen Sun, Longhao Zhang, Shiqi Yang, Bang Zhang, Liefeng Bo, Yaxing Wang, Ming-Ming Cheng,
- Abstract summary: We propose textbfMaTe3D: mask-guided text-based 3D-aware portrait editing.
New SDF-based 3D generator learns local and global representations with proposed SDF and density consistency losses.
Conditional Distillation on Geometry and Texture (CDGT) mitigates visual ambiguity and avoids mismatch between texture and geometry.
- Score: 61.014328598895524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D-aware portrait editing has a wide range of applications in multiple fields. However, current approaches are limited due that they can only perform mask-guided or text-based editing. Even by fusing the two procedures into a model, the editing quality and stability cannot be ensured. To address this limitation, we propose \textbf{MaTe3D}: mask-guided text-based 3D-aware portrait editing. In this framework, first, we introduce a new SDF-based 3D generator which learns local and global representations with proposed SDF and density consistency losses. This enhances masked-based editing in local areas; second, we present a novel distillation strategy: Conditional Distillation on Geometry and Texture (CDGT). Compared to exiting distillation strategies, it mitigates visual ambiguity and avoids mismatch between texture and geometry, thereby producing stable texture and convincing geometry while editing. Additionally, we create the CatMask-HQ dataset, a large-scale high-resolution cat face annotation for exploration of model generalization and expansion. We perform expensive experiments on both the FFHQ and CatMask-HQ datasets to demonstrate the editing quality and stability of the proposed method. Our method faithfully generates a 3D-aware edited face image based on a modified mask and a text prompt. Our code and models will be publicly released.
Related papers
- Revealing Directions for Text-guided 3D Face Editing [52.85632020601518]
3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals.
We present Face Clan, a text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions.
Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description.
arXiv Detail & Related papers (2024-10-07T12:04:39Z) - DragTex: Generative Point-Based Texture Editing on 3D Mesh [11.163205302136625]
We propose a generative point-based 3D mesh texture editing method called DragTex.
This method utilizes a diffusion model to blend locally inconsistent textures in the region near the deformed silhouette between different views.
We train LoRA using multi-view images instead of training each view individually, which significantly shortens the training time.
arXiv Detail & Related papers (2024-03-04T17:05:01Z) - LatentEditor: Text Driven Local Editing of 3D Scenes [8.966537479017951]
We introduce textscLatentEditor, a framework for precise and locally controlled editing of neural fields using text prompts.
We successfully embed real-world scenes into the latent space, resulting in a faster and more adaptable NeRF backbone for editing.
Our approach achieves faster editing speeds and superior output quality compared to existing 3D editing models.
arXiv Detail & Related papers (2023-12-14T19:38:06Z) - Text-Guided 3D Face Synthesis -- From Generation to Editing [53.86765812392627]
We propose a unified text-guided framework from face generation to editing.
We employ a fine-tuned texture diffusion model to enhance texture quality in both RGB and YUV space.
We propose a self-guided consistency weight strategy to improve editing efficacy while preserving consistency.
arXiv Detail & Related papers (2023-12-01T06:36:23Z) - Directional Texture Editing for 3D Models [51.31499400557996]
ITEM3D is designed for automatic textbf3D object editing according to the text textbfInstructions.
Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation.
arXiv Detail & Related papers (2023-09-26T12:01:13Z) - SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing
Field [37.8162035179377]
We present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image.
To achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space.
Our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes.
arXiv Detail & Related papers (2023-03-23T13:58:11Z) - Edit-A-Video: Single Video Editing with Object-Aware Consistency [49.43316939996227]
We propose a video editing framework given only a pretrained TTI model and a single text, video> pair, which we term Edit-A-Video.
The framework consists of two stages: (1) inflating the 2D model into the 3D model by appending temporal modules tuning and on the source video (2) inverting the source video into the noise and editing with target text prompt and attention map injection.
We present extensive experimental results over various types of text and videos, and demonstrate the superiority of the proposed method compared to baselines in terms of background consistency, text alignment, and video editing quality.
arXiv Detail & Related papers (2023-03-14T14:35:59Z) - Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion [115.82306502822412]
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing.
A corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing.
We study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures.
arXiv Detail & Related papers (2022-12-14T18:49:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.