LucidDreaming: Controllable Object-Centric 3D Generation
- URL: http://arxiv.org/abs/2312.00588v1
- Date: Thu, 30 Nov 2023 18:55:23 GMT
- Title: LucidDreaming: Controllable Object-Centric 3D Generation
- Authors: Zhaoning Wang, Ming Li, Chen Chen
- Abstract summary: We present LucidDreaming as an effective pipeline capable of fine-grained control over 3D generation.
It requires only minimal input of 3D bounding boxes, which can be deduced from a simple text prompt.
We show that our method exhibits remarkable adaptability across a spectrum of mainstream Score Distillation Sampling-based 3D generation frameworks.
- Score: 11.965998779054079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the recent development of generative models, Text-to-3D generations have
also seen significant growth. Nonetheless, achieving precise control over 3D
generation continues to be an arduous task, as using text to control often
leads to missing objects and imprecise locations. Contemporary strategies for
enhancing controllability in 3D generation often entail the introduction of
additional parameters, such as customized diffusion models. This often induces
hardness in adapting to different diffusion models or creating distinct
objects.
In this paper, we present LucidDreaming as an effective pipeline capable of
fine-grained control over 3D generation. It requires only minimal input of 3D
bounding boxes, which can be deduced from a simple text prompt using a Large
Language Model. Specifically, we propose clipped ray sampling to separately
render and optimize objects with user specifications. We also introduce
object-centric density blob bias, fostering the separation of generated
objects. With individual rendering and optimizing of objects, our method excels
not only in controlled content generation from scratch but also within the
pre-trained NeRF scenes. In such scenarios, existing generative approaches
often disrupt the integrity of the original scene, and current editing methods
struggle to synthesize new content in empty spaces. We show that our method
exhibits remarkable adaptability across a spectrum of mainstream Score
Distillation Sampling-based 3D generation frameworks, and achieves superior
alignment of 3D content when compared to baseline approaches. We also provide a
dataset of prompts with 3D bounding boxes, benchmarking 3D spatial
controllability.
Related papers
- iControl3D: An Interactive System for Controllable 3D Scene Generation [57.048647153684485]
iControl3D is a novel interactive system that empowers users to generate and render customizable 3D scenes with precise control.
We leverage 3D meshes as an intermediary proxy to iteratively merge individual 2D diffusion-generated images into a cohesive and unified 3D scene representation.
Our neural rendering interface enables users to build a radiance field of their scene online and navigate the entire scene.
arXiv Detail & Related papers (2024-08-03T06:35:09Z) - General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.