COMOGen: A Controllable Text-to-3D Multi-object Generation Framework
- URL: http://arxiv.org/abs/2409.00590v1
- Date: Sun, 1 Sep 2024 02:50:38 GMT
- Title: COMOGen: A Controllable Text-to-3D Multi-object Generation Framework
- Authors: Shaorong Sun, Shuchao Pang, Yazhou Yao, Xiaoshui Huang,
- Abstract summary: This paper introduces COMOGen, a text-to-3D Multi-Object Generation framework.
COMOGen enables the simultaneous generation of multiple 3D objects by the distillation of layout and multi-view prior knowledge.
Comprehensive experiments demonstrate the effectiveness of our approach compared to the state-of-the-art methods.
- Score: 22.05619100307402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The controllability of 3D object generation methods is achieved through input text. Existing text-to-3D object generation methods primarily focus on generating a single object based on a single object description. However, these methods often face challenges in producing results that accurately correspond to our desired positions when the input text involves multiple objects. To address the issue of controllability in generating multiple objects, this paper introduces COMOGen, a COntrollable text-to-3D Multi-Object Generation framework. COMOGen enables the simultaneous generation of multiple 3D objects by the distillation of layout and multi-view prior knowledge. The framework consists of three modules: the layout control module, the multi-view consistency control module, and the 3D content enhancement module. Moreover, to integrate these three modules as an integral framework, we propose Layout Multi-view Score Distillation, which unifies two prior knowledge and further enhances the diversity and quality of generated 3D content. Comprehensive experiments demonstrate the effectiveness of our approach compared to the state-of-the-art methods, which represents a significant step forward in enabling more controlled and versatile text-based 3D content generation.
Related papers
- Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with
Large Language Models [71.2931570433261]
We introduce Uni3D-LLM, a unified framework that leverages a Large Language Model (LLM) to integrate tasks of 3D perception, generation, and editing within point cloud scenes.
Uni3D-LLM harnesses the expressive power of natural language to allow for precise command over the generation and editing of 3D objects.
arXiv Detail & Related papers (2024-01-09T06:20:23Z) - VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation.
A lightweight network is developed to efficiently acquire feature volumes from multi-view images.
The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z) - TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes [67.5351491691866]
We present a novel framework, dubbed TeMO, to parse multi-object 3D scenes and edit their styles.
Our method can synthesize high-quality stylized content and outperform the existing methods over a wide range of multi-object 3D meshes.
arXiv Detail & Related papers (2023-12-07T12:10:05Z) - ControlDreamer: Blending Geometry and Style in Text-to-3D [34.92628800597151]
We introduce multi-view ControlNet, a novel depth-aware multi-view diffusion model trained on datasets from a carefully curated text corpus.
Our multi-view ControlNet is then integrated into our two-stage pipeline, ControlDreamer, enabling text-guided generation of stylized 3D models.
arXiv Detail & Related papers (2023-12-02T13:04:54Z) - Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding [57.64806066986975]
3D Visual Grounding aims at localizing 3D object based on textual descriptions.
We propose a novel visual programming approach for zero-shot open-vocabulary 3DVG.
arXiv Detail & Related papers (2023-11-26T19:01:14Z) - SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and
Multi-View for 3D Object Retrieval [8.74845857766369]
Multi-modality 3D object retrieval is rarely developed and analyzed on large-scale datasets.
We propose self-and-cross attention based aggregation of point cloud and multi-view images (SCA-PVNet) for 3D object retrieval.
arXiv Detail & Related papers (2023-07-20T05:46:32Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.