PointLLM: Empowering Large Language Models to Understand Point Clouds
- URL: http://arxiv.org/abs/2308.16911v2
- Date: Fri, 1 Dec 2023 07:55:16 GMT
- Title: PointLLM: Empowering Large Language Models to Understand Point Clouds
- Authors: Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua
Lin
- Abstract summary: PointLLM understands colored object point clouds with human instructions.
It generates contextually appropriate responses, illustrating its grasp of point clouds and common sense.
- Score: 67.1783384610417
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The unprecedented advancements in Large Language Models (LLMs) have shown a
profound impact on natural language processing but are yet to fully embrace the
realm of 3D understanding. This paper introduces PointLLM, a preliminary effort
to fill this gap, enabling LLMs to understand point clouds and offering a new
avenue beyond 2D visual data. PointLLM understands colored object point clouds
with human instructions and generates contextually appropriate responses,
illustrating its grasp of point clouds and common sense. Specifically, it
leverages a point cloud encoder with a powerful LLM to effectively fuse
geometric, appearance, and linguistic information. We collect a novel dataset
comprising 660K simple and 70K complex point-text instruction pairs to enable a
two-stage training strategy: aligning latent spaces and subsequently
instruction-tuning the unified model. To rigorously evaluate the perceptual and
generalization capabilities of PointLLM, we establish two benchmarks:
Generative 3D Object Classification and 3D Object Captioning, assessed through
three different methods, including human evaluation, GPT-4/ChatGPT evaluation,
and traditional metrics. Experimental results reveal PointLLM's superior
performance over existing 2D and 3D baselines, with a notable achievement in
human-evaluated object captioning tasks where it surpasses human annotators in
over 50% of the samples. Codes, datasets, and benchmarks are available at
https://github.com/OpenRobotLab/PointLLM .
Related papers
- GPT4Point: A Unified Framework for Point-Language Understanding and
Generation [76.61439685940272]
GPT4Point is a groundbreaking point-language multimodal model for unified 3D object understanding and generation within the MLLM framework.
GPT4Point as a powerful 3D MLLM seamlessly can execute a variety of point-text reference tasks such as point-cloud captioning and Q&A.
It can get high-quality results through a low-quality point-text feature maintaining the geometric shapes and colors.
arXiv Detail & Related papers (2023-12-05T18:59:55Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Explore In-Context Learning for 3D Point Cloud Understanding [71.20912026561484]
We introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds.
We propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator.
We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks.
arXiv Detail & Related papers (2023-06-14T17:53:21Z) - PointCLIMB: An Exemplar-Free Point Cloud Class Incremental Benchmark [11.992472563628283]
We pioneer to leverage exemplar free class incremental learning on Point Clouds.
We setup a benchmark for 3D Exemplar free class incremental learning.
We investigate performance of various backbones on 3D-Exemplar Free Class Incremental Learning framework.
arXiv Detail & Related papers (2023-04-13T18:47:29Z) - Point2Vec for Self-Supervised Representation Learning on Point Clouds [66.53955515020053]
We extend data2vec to the point cloud domain and report encouraging results on several downstream tasks.
We propose point2vec, which unleashes the full potential of data2vec-like pre-training on point clouds.
arXiv Detail & Related papers (2023-03-29T10:08:29Z) - Joint Representation Learning for Text and 3D Point Cloud [35.67281936143821]
We propose a novel Text4Point framework to construct language-guided 3D point cloud models.
The proposed Text4Point follows the pre-training and fine-tuning paradigm.
Our model shows consistent improvement on various downstream tasks, such as point cloud semantic segmentation, instance segmentation, and object detection.
arXiv Detail & Related papers (2023-01-18T15:02:07Z) - CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D
Point Cloud Understanding [2.8661021832561757]
CrossPoint is a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations.
Our approach outperforms the previous unsupervised learning methods on a diverse range of downstream tasks including 3D object classification and segmentation.
arXiv Detail & Related papers (2022-03-01T18:59:01Z) - Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical
Understanding of Outdoor Scene [76.4183572058063]
We present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks.
The dataset has been point-wisely annotated with both hierarchical and instance-based labels.
We formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies.
arXiv Detail & Related papers (2020-08-11T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.