LucidDreaming: Controllable Object-Centric 3D Generation
- URL: http://arxiv.org/abs/2312.00588v2
- Date: Fri, 9 Aug 2024 17:34:04 GMT
- Title: LucidDreaming: Controllable Object-Centric 3D Generation
- Authors: Zhaoning Wang, Ming Li, Chen Chen,
- Abstract summary: We present a pipeline capable of spatial and numerical control over 3D generation from only textual prompt commands or 3D bounding boxes.
LucidDreaming achieves superior results in object placement precision and generation fidelity compared to current approaches.
- Score: 10.646855651524387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the recent development of generative models, Text-to-3D generations have also seen significant growth, opening a door for creating video-game 3D assets from a more general public. Nonetheless, people without any professional 3D editing experience would find it hard to achieve precise control over the 3D generation, especially if there are multiple objects in the prompt, as using text to control often leads to missing objects and imprecise locations. In this paper, we present LucidDreaming as an effective pipeline capable of spatial and numerical control over 3D generation from only textual prompt commands or 3D bounding boxes. Specifically, our research demonstrates that Large Language Models (LLMs) possess 3D spatial awareness and can effectively translate textual 3D information into precise 3D bounding boxes. We leverage LLMs to get individual object information and their 3D bounding boxes as the initial step of our process. Then with the bounding boxes, We further propose clipped ray sampling and object-centric density blob bias to generate 3D objects aligning with the bounding boxes. We show that our method exhibits remarkable adaptability across a spectrum of mainstream Score Distillation Sampling-based 3D generation frameworks and our pipeline can even used to insert objects into an existing NeRF scene. Moreover, we also provide a dataset of prompts with 3D bounding boxes, benchmarking 3D spatial controllability. With extensive qualitative and quantitative experiments, we demonstrate that LucidDreaming achieves superior results in object placement precision and generation fidelity compared to current approaches, while maintaining flexibility and ease of use for non-expert users.
Related papers
- Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det.
OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes.
It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z) - iControl3D: An Interactive System for Controllable 3D Scene Generation [57.048647153684485]
iControl3D is a novel interactive system that empowers users to generate and render customizable 3D scenes with precise control.
We leverage 3D meshes as an intermediary proxy to iteratively merge individual 2D diffusion-generated images into a cohesive and unified 3D scene representation.
Our neural rendering interface enables users to build a radiance field of their scene online and navigate the entire scene.
arXiv Detail & Related papers (2024-08-03T06:35:09Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.