InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image
- URL: http://arxiv.org/abs/2311.02826v2
- Date: Fri, 2 Feb 2024 11:56:41 GMT
- Title: InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image
- Authors: Jianhui Li, Shilong Liu, Zidong Liu, Yikai Wang, Kaiwen Zheng, Jinghui
Xu, Jianmin Li, Jun Zhu
- Abstract summary: InstructPix2NeRF enables instructed 3D-aware portrait editing from a single open-world image with human instructions.
At its core lies a conditional latent 3D diffusion process that lifts 2D editing to 3D space by learning the correlation between the paired images' difference and the instructions via triplet data.
- Score: 25.076270175205593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the success of Neural Radiance Field (NeRF) in 3D-aware portrait
editing, a variety of works have achieved promising results regarding both
quality and 3D consistency. However, these methods heavily rely on per-prompt
optimization when handling natural language as editing instructions. Due to the
lack of labeled human face 3D datasets and effective architectures, the area of
human-instructed 3D-aware editing for open-world portraits in an end-to-end
manner remains under-explored. To solve this problem, we propose an end-to-end
diffusion-based framework termed InstructPix2NeRF, which enables instructed
3D-aware portrait editing from a single open-world image with human
instructions. At its core lies a conditional latent 3D diffusion process that
lifts 2D editing to 3D space by learning the correlation between the paired
images' difference and the instructions via triplet data. With the help of our
proposed token position randomization strategy, we could even achieve
multi-semantic editing through one single pass with the portrait identity
well-preserved. Besides, we further propose an identity consistency module that
directly modulates the extracted identity signals into our diffusion process,
which increases the multi-view 3D identity consistency. Extensive experiments
verify the effectiveness of our method and show its superiority against strong
baselines quantitatively and qualitatively. Source code and pre-trained models
can be found on our project page:
\url{https://mybabyyh.github.io/InstructPix2NeRF}.
Related papers
- Revealing Directions for Text-guided 3D Face Editing [52.85632020601518]
3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals.
We present Face Clan, a text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions.
Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description.
arXiv Detail & Related papers (2024-10-07T12:04:39Z) - DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions.
A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process.
This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z) - Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors [24.478875248825563]
We propose a novel image editing technique that enables 3D manipulations on single images.
Our method directly leverages powerful image diffusion models trained on a broad spectrum of text-image pairs.
Our method can generate high-quality 3D-aware image edits with large viewpoint transformations and high appearance and shape consistency with the input image.
arXiv Detail & Related papers (2024-03-18T06:18:59Z) - Learning Naturally Aggregated Appearance for Efficient 3D Editing [94.47518916521065]
We propose to replace the color field with an explicit 2D appearance aggregation, also called canonical image.
To avoid the distortion effect and facilitate convenient editing, we complement the canonical image with a projection field that maps 3D points onto 2D pixels for texture lookup.
Our representation, dubbed AGAP, well supports various ways of 3D editing (e.g., stylization, interactive drawing, and content extraction) with no need of re-optimization.
arXiv Detail & Related papers (2023-12-11T18:59:31Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single
Image [23.06474962139909]
We study the 3D-aware image attribute editing problem in this paper.
Recent methods solved the problem by training a shared encoder to map images into a 3D generator's latent space.
We propose two novel methods, an alternating training scheme and a multi-view identity loss, to maintain 3D consistency.
arXiv Detail & Related papers (2023-04-20T12:33:56Z) - 3DDesigner: Towards Photorealistic 3D Object Generation and Editing with
Text-guided Diffusion Models [71.25937799010407]
We equip text-guided diffusion models to achieve 3D-consistent generation.
We study 3D local editing and propose a two-step solution.
We extend our model to perform one-shot novel view synthesis.
arXiv Detail & Related papers (2022-11-25T13:50:00Z) - SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural
Radiance Fields [26.296017756560467]
In 3D, solutions must be consistent across multiple views and geometrically valid.
We propose a novel 3D inpainting method that addresses these challenges.
We first demonstrate the superiority of our approach on multiview segmentation, comparing to NeRFbased methods and 2D segmentation approaches.
arXiv Detail & Related papers (2022-11-22T13:14:50Z) - Explicitly Controllable 3D-Aware Portrait Generation [42.30481422714532]
We propose a 3D portrait generation network that produces consistent portraits according to semantic parameters regarding pose, identity, expression and lighting.
Our method outperforms prior arts in extensive experiments, producing realistic portraits with vivid expression in natural lighting when viewed in free viewpoint.
arXiv Detail & Related papers (2022-09-12T17:40:08Z) - 3D-FM GAN: Towards 3D-Controllable Face Manipulation [43.99393180444706]
3D-FM GAN is a novel conditional GAN framework designed specifically for 3D-controllable face manipulation.
By carefully encoding both the input face image and a physically-based rendering of 3D edits into a StyleGAN's latent spaces, our image generator provides high-quality, identity-preserved, 3D-controllable face manipulation.
We show that our method outperforms the prior arts on various tasks, with better editability, stronger identity preservation, and higher photo-realism.
arXiv Detail & Related papers (2022-08-24T01:33:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.