Related papers: CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout

CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout

URL: http://arxiv.org/abs/2303.13843v3
Date: Sat, 2 Dec 2023 09:01:28 GMT
Title: CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout
Authors: Haotian Bai, Yuanhuiyi Lyu, Lutao Jiang, Sijia Li, Haonan Lu, Xiaodong Lin, Lin Wang
Abstract summary: CompoNeRF is a framework that integrates an editable 3D scene layout with object specific and scene-wide guidance mechanisms. Our framework achieves up to a 54% improvement in performance, as measured by the multi-view CLIP score metric.
Score: 14.034561752463796
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances have shown promise in merging neural radiance fields (NeRFs) with pre-trained diffusion models for text-to-3D object generation. However, one enduring challenge is their inadequate capability to accurately parse and regenerate consistent multi-object environments. Specifically, these models encounter difficulties in accurately representing quantity and style prompted by multi-object texts, often resulting in a collapse of the rendering fidelity that fails to match the semantic intricacies. Moreover, amalgamating these elements into a coherent 3D scene is a substantial challenge, stemming from generic distribution inherent in diffusion models. To tackle the issue of 'guidance collapse' and enhance consistency, we propose a novel framework, dubbed CompoNeRF, by integrating an editable 3D scene layout with object specific and scene-wide guidance mechanisms. It initiates by interpreting a complex text into an editable 3D layout populated with multiple NeRFs, each paired with a corresponding subtext prompt for precise object depiction. Next, a tailored composition module seamlessly blends these NeRFs, promoting consistency, while the dual-level text guidance reduces ambiguity and boosts accuracy. Noticeably, the unique modularity of CompoNeRF permits NeRF decomposition. This enables flexible scene editing and recomposition into new scenes based on the edited layout or text prompts. Utilizing the open source Stable Diffusion model, CompoNeRF not only generates scenes with high fidelity but also paves the way for innovative multi-object composition using editable 3D layouts. Remarkably, our framework achieves up to a 54\% improvement in performance, as measured by the multi-view CLIP score metric. Code is available at https://github.com/hbai98/Componerf.

Related papers

SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing [58.22339174221563]
We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing. SyncNoise synchronously edits multiple views with 2D diffusion models while enforcing multi-view noise predictions to be geometrically consistent. Our method achieves high-quality 3D editing results respecting the textual instructions, especially in scenes with complex textures.
arXiv Detail & Related papers (2024-06-25T09:17:35Z)
${M^2D}$NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields [33.168225243348786]
We present a single model, Multi-Modal Decomposition NeRF ($M2D$NeRF), that is capable of both text-based and visual patch-based edits. Specifically, we use multi-modal feature distillation to integrate teacher features from pretrained visual and language models into 3D semantic feature volumes. Experiments on various real-world scenes show superior performance in 3D scene decomposition tasks compared to prior NeRF-based methods.
arXiv Detail & Related papers (2024-05-08T12:25:21Z)
Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts. We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline. We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z)
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes [67.5351491691866]
We present a novel framework, dubbed TeMO, to parse multi-object 3D scenes and edit their styles. Our method can synthesize high-quality stylized content and outperform the existing methods over a wide range of multi-object 3D meshes.
arXiv Detail & Related papers (2023-12-07T12:10:05Z)
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z)
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts [60.54912319612113]
Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts. We present a novel approach leveraging Large Language Models (LLMs) to extract critical components from text prompts. Our evaluation on complex prompts featuring multiple objects demonstrates a substantial improvement in recall compared to baseline diffusion models.
arXiv Detail & Related papers (2023-10-16T17:57:37Z)
Directional Texture Editing for 3D Models [51.31499400557996]
ITEM3D is designed for automatic textbf3D object editing according to the text textbfInstructions. Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation.
arXiv Detail & Related papers (2023-09-26T12:01:13Z)
Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields [26.85599376826124]
We present Blended-NeRF, a framework for editing a specific region of interest in an existing NeRF scene. We allow local editing by localizing a 3D ROI box in the input scene, and blend the content synthesized inside the ROI with the existing scene. We show our framework for several 3D editing applications, including adding new objects to a scene, removing/altering existing objects, and texture conversion.
arXiv Detail & Related papers (2023-06-22T09:34:55Z)
Compositional 3D Scene Generation using Locally Conditioned Diffusion [49.5784841881488]
We introduce textbflocally conditioned diffusion as an approach to compositional scene diffusion. We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.
arXiv Detail & Related papers (2023-03-21T22:37:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.