Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
- URL: http://arxiv.org/abs/2406.04413v2
- Date: Wed, 24 Jul 2024 10:16:33 GMT
- Title: Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
- Authors: Amandeep Kumar, Muhammad Awais, Sanath Narayan, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer,
- Abstract summary: We propose an efficient, plug-and-play, 3D-aware face editing framework based on attribute-specific prompt learning.
Our proposed framework generates high-quality images with 3D awareness and view consistency while maintaining attribute-specific features.
- Score: 40.6806832534633
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Drawing upon StyleGAN's expressivity and disentangled latent space, existing 2D approaches employ textual prompting to edit facial images with different attributes. In contrast, 3D-aware approaches that generate faces at different target poses require attribute-specific classifiers, learning separate model weights for each attribute, and are not scalable for novel attributes. In this work, we propose an efficient, plug-and-play, 3D-aware face editing framework based on attribute-specific prompt learning, enabling the generation of facial images with controllable attributes across various target poses. To this end, we introduce a text-driven learnable style token-based latent attribute editor (LAE). The LAE harnesses a pre-trained vision-language model to find text-guided attribute-specific editing direction in the latent space of any pre-trained 3D-aware GAN. It utilizes learnable style tokens and style mappers to learn and transform this editing direction to 3D latent space. To train LAE with multiple attributes, we use directional contrastive loss and style token loss. Furthermore, to ensure view consistency and identity preservation across different poses and attributes, we employ several 3D-aware identity and pose preservation losses. Our experiments show that our proposed framework generates high-quality images with 3D awareness and view consistency while maintaining attribute-specific features. We demonstrate the effectiveness of our method on different facial attributes, including hair color and style, expression, and others.
Related papers
- Revealing Directions for Text-guided 3D Face Editing [52.85632020601518]
3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals.
We present Face Clan, a text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions.
Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description.
arXiv Detail & Related papers (2024-10-07T12:04:39Z) - SAT3D: Image-driven Semantic Attribute Transfer in 3D [31.087615253643975]
We propose an image-driven Semantic Attribute Transfer method in 3D (SAT3D) by editing semantic attributes from a reference image.
For guidance, we associate each attribute with a set of phrase-based descriptor groups, and develop a Quantitative Measurement Module (QMM)
We present our 3D-aware attribute transfer results across multiple domains and also conduct comparisons with classical 2D image editing methods.
arXiv Detail & Related papers (2024-08-03T04:41:46Z) - A Reference-Based 3D Semantic-Aware Framework for Accurate Local Facial Attribute Editing [19.21301510545666]
We introduce a novel framework that merges latent-based and reference-based editing methods.
Our approach employs a 3D GAN inversion technique to embed attributes from the reference image into a tri-plane space.
A coarse-to-fine inpainting strategy is then applied to preserve the integrity of untargeted areas.
arXiv Detail & Related papers (2024-07-25T20:55:23Z) - ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling [96.87575334960258]
ID-to-3D is a method to generate identity- and text-guided 3D human heads with disentangled expressions.
Results achieve an unprecedented level of identity-consistent and high-quality texture and geometry generation.
arXiv Detail & Related papers (2024-05-26T13:36:45Z) - DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation [84.0586749616249]
This paper presents DiffFAE, a one-stage and highly-efficient diffusion-based framework tailored for high-fidelity Facial Appearance Editing.
For high-fidelity query attributes transfer, we adopt Space-sensitive Physical Customization (SPC), which ensures the fidelity and generalization ability.
In order to preserve source attributes, we introduce the Region-responsive Semantic Composition (RSC)
This module is guided to learn decoupled source-regarding features, thereby better preserving the identity and alleviating artifacts from non-facial attributes such as hair, clothes, and background.
arXiv Detail & Related papers (2024-03-26T12:53:10Z) - AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute
Decomposition and Indexing [79.38471599977011]
We propose AttriHuman-3D, an editable 3D human generation model.
It generates all attributes in an overall attribute space with six feature planes, which are decomposed and manipulated with different attribute indexes.
Our model provides a strong disentanglement between different attributes, allows fine-grained image editing and generates high-quality 3D human avatars.
arXiv Detail & Related papers (2023-12-03T03:20:10Z) - Improving Generalization of Image Captioning with Unsupervised Prompt
Learning [63.26197177542422]
Generalization of Image Captioning (GeneIC) learns a domain-specific prompt vector for the target domain without requiring annotated data.
GeneIC aligns visual and language modalities with a pre-trained Contrastive Language-Image Pre-Training (CLIP) model.
arXiv Detail & Related papers (2023-08-05T12:27:01Z) - Disentangling 3D Attributes from a Single 2D Image: Human Pose, Shape
and Garment [20.17991487155361]
We focus on the challenging task of extracting disentangled 3D attributes only from 2D image data.
Our method learns an embedding with disentangled latent representations of these three image properties.
We show how an implicit shape loss can benefit the model's ability to recover fine-grained reconstruction details.
arXiv Detail & Related papers (2022-08-05T13:48:43Z) - Text and Image Guided 3D Avatar Generation and Manipulation [0.0]
We propose a novel 3D manipulation method that can manipulate both the shape and texture of the model using text or image-based prompts such as 'a young face' or 'a surprised face'
Our method requires only 5 minutes per manipulation, and we demonstrate the effectiveness of our approach with extensive results and comparisons.
arXiv Detail & Related papers (2022-02-12T14:37:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.