Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with
Large Language Models
- URL: http://arxiv.org/abs/2402.03327v1
- Date: Tue, 9 Jan 2024 06:20:23 GMT
- Title: Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with
Large Language Models
- Authors: Dingning Liu, Xiaoshui Huang, Yuenan Hou, Zhihui Wang, Zhenfei Yin,
Yongshun Gong, Peng Gao, Wanli Ouyang
- Abstract summary: We introduce Uni3D-LLM, a unified framework that leverages a Large Language Model (LLM) to integrate tasks of 3D perception, generation, and editing within point cloud scenes.
Uni3D-LLM harnesses the expressive power of natural language to allow for precise command over the generation and editing of 3D objects.
- Score: 71.2931570433261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce Uni3D-LLM, a unified framework that leverages a
Large Language Model (LLM) to integrate tasks of 3D perception, generation, and
editing within point cloud scenes. This framework empowers users to
effortlessly generate and modify objects at specified locations within a scene,
guided by the versatility of natural language descriptions. Uni3D-LLM harnesses
the expressive power of natural language to allow for precise command over the
generation and editing of 3D objects, thereby significantly enhancing
operational flexibility and controllability. By mapping point cloud into the
unified representation space, Uni3D-LLM achieves cross-application
functionality, enabling the seamless execution of a wide array of tasks,
ranging from the accurate instantiation of 3D objects to the diverse
requirements of interactive design. Through a comprehensive suite of rigorous
experiments, the efficacy of Uni3D-LLM in the comprehension, generation, and
editing of point cloud has been validated. Additionally, we have assessed the
impact of integrating a point cloud perception module on the generation and
editing processes, confirming the substantial potential of our approach for
practical applications.
Related papers
- PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model [4.079327215055764]
Affordance understanding, the task of identifying actionable regions on 3D objects, plays a vital role in allowing robotic systems to engage with and operate within the physical world.
Visual Language Models (VLMs) have excelled in high-level reasoning but fall short in grasping the nuanced physical properties required for effective human-robot interaction.
We introduce PAVLM, an innovative framework that utilizes the extensive multimodal knowledge embedded in pre-trained language models to enhance 3D affordance understanding of point cloud.
arXiv Detail & Related papers (2024-10-15T12:53:42Z) - COMOGen: A Controllable Text-to-3D Multi-object Generation Framework [22.05619100307402]
This paper introduces COMOGen, a text-to-3D Multi-Object Generation framework.
COMOGen enables the simultaneous generation of multiple 3D objects by the distillation of layout and multi-view prior knowledge.
Comprehensive experiments demonstrate the effectiveness of our approach compared to the state-of-the-art methods.
arXiv Detail & Related papers (2024-09-01T02:50:38Z) - Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning [52.81032340916171]
Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes.
Our method achieves superior controllability and flexibility in the 3D assets generation task.
arXiv Detail & Related papers (2024-05-13T17:56:13Z) - Interactive3D: Create What You Want by Interactive 3D Generation [13.003964182554572]
We introduce Interactive3D, an innovative framework for interactive 3D generation that grants users precise control over the generative process.
Our experiments demonstrate that Interactive3D markedly improves the controllability and quality of 3D generation.
arXiv Detail & Related papers (2024-04-25T11:06:57Z) - VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation.
A lightweight network is developed to efficiently acquire feature volumes from multi-view images.
The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z) - GPT4Point: A Unified Framework for Point-Language Understanding and
Generation [76.61439685940272]
GPT4Point is a groundbreaking point-language multimodal model for unified 3D object understanding and generation within the MLLM framework.
GPT4Point as a powerful 3D MLLM seamlessly can execute a variety of point-text reference tasks such as point-cloud captioning and Q&A.
It can get high-quality results through a low-quality point-text feature maintaining the geometric shapes and colors.
arXiv Detail & Related papers (2023-12-05T18:59:55Z) - Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding [57.64806066986975]
3D Visual Grounding aims at localizing 3D object based on textual descriptions.
We propose a novel visual programming approach for zero-shot open-vocabulary 3DVG.
arXiv Detail & Related papers (2023-11-26T19:01:14Z) - 3D-GPT: Procedural 3D Modeling with Large Language Models [47.72968643115063]
We introduce 3D-GPT, a framework utilizing large language models(LLMs) for instruction-driven 3D modeling.
3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task.
Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers.
arXiv Detail & Related papers (2023-10-19T17:41:48Z) - PointLLM: Empowering Large Language Models to Understand Point Clouds [63.39876878899682]
PointLLM understands colored object point clouds with human instructions.
It generates contextually appropriate responses, illustrating its grasp of point clouds and common sense.
arXiv Detail & Related papers (2023-08-31T17:59:46Z) - Towards Language-guided Interactive 3D Generation: LLMs as Layout
Interpreter with Generative Feedback [20.151147653552155]
Large Language Models (LLMs) have demonstrated impressive reasoning, conversational, and zero-shot generation abilities.
We propose a novel language-guided interactive 3D generation system, dubbed LI3D, that integrates LLMs as a 3D layout interpreter.
Our system also incorporates LLaVA, a large language and vision assistant, to provide generative feedback from the visual aspect for improving the visual quality of generated content.
arXiv Detail & Related papers (2023-05-25T07:43:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.