CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D
Scene Layout
- URL: http://arxiv.org/abs/2303.13843v3
- Date: Sat, 2 Dec 2023 09:01:28 GMT
- Title: CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D
Scene Layout
- Authors: Haotian Bai, Yuanhuiyi Lyu, Lutao Jiang, Sijia Li, Haonan Lu, Xiaodong
Lin, Lin Wang
- Abstract summary: CompoNeRF is a framework that integrates an editable 3D scene layout with object specific and scene-wide guidance mechanisms.
Our framework achieves up to a 54% improvement in performance, as measured by the multi-view CLIP score metric.
- Score: 14.034561752463796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances have shown promise in merging neural radiance fields (NeRFs)
with pre-trained diffusion models for text-to-3D object generation. However,
one enduring challenge is their inadequate capability to accurately parse and
regenerate consistent multi-object environments. Specifically, these models
encounter difficulties in accurately representing quantity and style prompted
by multi-object texts, often resulting in a collapse of the rendering fidelity
that fails to match the semantic intricacies. Moreover, amalgamating these
elements into a coherent 3D scene is a substantial challenge, stemming from
generic distribution inherent in diffusion models. To tackle the issue of
'guidance collapse' and enhance consistency, we propose a novel framework,
dubbed CompoNeRF, by integrating an editable 3D scene layout with object
specific and scene-wide guidance mechanisms. It initiates by interpreting a
complex text into an editable 3D layout populated with multiple NeRFs, each
paired with a corresponding subtext prompt for precise object depiction. Next,
a tailored composition module seamlessly blends these NeRFs, promoting
consistency, while the dual-level text guidance reduces ambiguity and boosts
accuracy. Noticeably, the unique modularity of CompoNeRF permits NeRF
decomposition. This enables flexible scene editing and recomposition into new
scenes based on the edited layout or text prompts. Utilizing the open source
Stable Diffusion model, CompoNeRF not only generates scenes with high fidelity
but also paves the way for innovative multi-object composition using editable
3D layouts. Remarkably, our framework achieves up to a 54\% improvement in
performance, as measured by the multi-view CLIP score metric. Code is available
at https://github.com/hbai98/Componerf.
Related papers
- SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing [58.22339174221563]
We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing.
SyncNoise synchronously edits multiple views with 2D diffusion models while enforcing multi-view noise predictions to be geometrically consistent.
Our method achieves high-quality 3D editing results respecting the textual instructions, especially in scenes with complex textures.
arXiv Detail & Related papers (2024-06-25T09:17:35Z) - ${M^2D}$NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields [33.168225243348786]
We present a single model, Multi-Modal Decomposition NeRF ($M2D$NeRF), that is capable of both text-based and visual patch-based edits.
Specifically, we use multi-modal feature distillation to integrate teacher features from pretrained visual and language models into 3D semantic feature volumes.
Experiments on various real-world scenes show superior performance in 3D scene decomposition tasks compared to prior NeRF-based methods.
arXiv Detail & Related papers (2024-05-08T12:25:21Z) - NeRF-Insert: 3D Local Editing with Multimodal Control Signals [97.91172669905578]
NeRF-Insert is a NeRF editing framework that allows users to make high-quality local edits with a flexible level of control.
We cast scene editing as an in-painting problem, which encourages the global structure of the scene to be preserved.
Our results show better visual quality and also maintain stronger consistency with the original NeRF.
arXiv Detail & Related papers (2024-04-30T02:04:49Z) - TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes [67.5351491691866]
We present a novel framework, dubbed TeMO, to parse multi-object 3D scenes and edit their styles.
Our method can synthesize high-quality stylized content and outperform the existing methods over a wide range of multi-object 3D meshes.
arXiv Detail & Related papers (2023-12-07T12:10:05Z) - Directional Texture Editing for 3D Models [51.31499400557996]
ITEM3D is designed for automatic textbf3D object editing according to the text textbfInstructions.
Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation.
arXiv Detail & Related papers (2023-09-26T12:01:13Z) - Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields [16.375242125946965]
We propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF.
We introduce new blending operations that allow Blending-NeRF to properly edit target regions which are localized by text.
Our experiments demonstrate that Blending-NeRF produces naturally and locally edited 3D objects from various text prompts.
arXiv Detail & Related papers (2023-08-23T07:46:44Z) - Blended-NeRF: Zero-Shot Object Generation and Blending in Existing
Neural Radiance Fields [26.85599376826124]
We present Blended-NeRF, a framework for editing a specific region of interest in an existing NeRF scene.
We allow local editing by localizing a 3D ROI box in the input scene, and blend the content synthesized inside the ROI with the existing scene.
We show our framework for several 3D editing applications, including adding new objects to a scene, removing/altering existing objects, and texture conversion.
arXiv Detail & Related papers (2023-06-22T09:34:55Z) - RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models [36.236190350126826]
We propose a novel framework that can take RGB images as input and alter the 3D content in neural scenes.
Specifically, we semantically select the target object and a pre-trained diffusion model will guide the NeRF model to generate new 3D objects.
Experiment results show that our algorithm is effective for editing 3D objects in NeRF under different text prompts.
arXiv Detail & Related papers (2023-06-09T04:49:31Z) - Compositional 3D Scene Generation using Locally Conditioned Diffusion [49.5784841881488]
We introduce textbflocally conditioned diffusion as an approach to compositional scene diffusion.
We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.
arXiv Detail & Related papers (2023-03-21T22:37:16Z) - SKED: Sketch-guided Text-based 3D Editing [49.019881133348775]
We present SKED, a technique for editing 3D shapes represented by NeRFs.
Our technique utilizes as few as two guiding sketches from different views to alter an existing neural field.
We propose novel loss functions to generate the desired edits while preserving the density and radiance of the base instance.
arXiv Detail & Related papers (2023-03-19T18:40:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.