Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
- URL: http://arxiv.org/abs/2306.09347v2
- Date: Tue, 24 Oct 2023 09:51:00 GMT
- Title: Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
- Authors: Youquan Liu and Lingdong Kong and Jun Cen and Runnan Chen and Wenwei
Zhang and Liang Pan and Kai Chen and Ziwei Liu
- Abstract summary: Seal is a framework that harnesses vision foundation models (VFMs) for segmenting diverse automotive point cloud sequences.
Seal exhibits three appealing properties: Scalability, consistency and generalizability.
- Score: 55.12618600523729
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advancements in vision foundation models (VFMs) have opened up new
possibilities for versatile and efficient visual perception. In this work, we
introduce Seal, a novel framework that harnesses VFMs for segmenting diverse
automotive point cloud sequences. Seal exhibits three appealing properties: i)
Scalability: VFMs are directly distilled into point clouds, obviating the need
for annotations in either 2D or 3D during pretraining. ii) Consistency: Spatial
and temporal relationships are enforced at both the camera-to-LiDAR and
point-to-segment regularization stages, facilitating cross-modal representation
learning. iii) Generalizability: Seal enables knowledge transfer in an
off-the-shelf manner to downstream tasks involving diverse point clouds,
including those from real/synthetic, low/high-resolution, large/small-scale,
and clean/corrupted datasets. Extensive experiments conducted on eleven
different point cloud datasets showcase the effectiveness and superiority of
Seal. Notably, Seal achieves a remarkable 45.0% mIoU on nuScenes after linear
probing, surpassing random initialization by 36.9% mIoU and outperforming prior
arts by 6.1% mIoU. Moreover, Seal demonstrates significant performance gains
over existing methods across 20 different few-shot fine-tuning tasks on all
eleven tested point cloud datasets.
Related papers
- GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding via Self-supervised Learning [15.559369116540097]
Self-supervised learning of point cloud aims to leverage unlabeled 3D data to learn meaningful representations without reliance on manual annotations.
We propose GS-PT, which integrates 3D Gaussian Splatting (3DGS) into point cloud self-supervised learning for the first time.
Our pipeline utilizes transformers as the backbone for self-supervised pre-training and introduces novel contrastive learning tasks through 3DGS.
arXiv Detail & Related papers (2024-09-08T03:46:47Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - 3D Unsupervised Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving [17.42913935045091]
We propose UOV, a novel 3D Unsupervised framework assisted by 2D Open-Vocabulary segmentation models.
In the first stage, we innovatively integrate high-quality textual and image features of 2D open-vocabulary models.
In the second stage, spatial mapping between point clouds and images is utilized to generate pseudo-labels.
arXiv Detail & Related papers (2024-05-24T07:18:09Z) - Zero-shot Point Cloud Completion Via 2D Priors [52.72867922938023]
3D point cloud completion is designed to recover complete shapes from partially observed point clouds.
We propose a zero-shot framework aimed at completing partially observed point clouds across any unseen categories.
arXiv Detail & Related papers (2024-04-10T08:02:17Z) - Point Cloud Pre-training with Diffusion Models [62.12279263217138]
We propose a novel pre-training method called Point cloud Diffusion pre-training (PointDif)
PointDif achieves substantial improvement across various real-world datasets for diverse downstream tasks such as classification, segmentation and detection.
arXiv Detail & Related papers (2023-11-25T08:10:05Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Dual Adaptive Transformations for Weakly Supervised Point Cloud
Segmentation [78.6612285236938]
We propose a novel DAT (textbfDual textbfAdaptive textbfTransformations) model for weakly supervised point cloud segmentation.
We evaluate our proposed DAT model with two popular backbones on the large-scale S3DIS and ScanNet-V2 datasets.
arXiv Detail & Related papers (2022-07-19T05:43:14Z) - Efficient Urban-scale Point Clouds Segmentation with BEV Projection [0.0]
Most deep point clouds models directly conduct learning on 3D point clouds.
We propose to transfer the 3D point clouds to dense bird's-eye-view projection.
arXiv Detail & Related papers (2021-09-19T06:49:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.