Self-Supervised Pretraining of 3D Features on any Point-Cloud
- URL: http://arxiv.org/abs/2101.02691v1
- Date: Thu, 7 Jan 2021 18:55:21 GMT
- Title: Self-Supervised Pretraining of 3D Features on any Point-Cloud
- Authors: Zaiwei Zhang, Rohit Girdhar, Armand Joulin, Ishan Misra
- Abstract summary: We present a simple self-supervised pertaining method that can work with any 3D data without 3D registration.
We evaluate our models on 9 benchmarks for object detection, semantic segmentation, and object classification, where they achieve state-of-the-art results and can outperform supervised pretraining.
- Score: 40.26575888582241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretraining on large labeled datasets is a prerequisite to achieve good
performance in many computer vision tasks like 2D object recognition, video
classification etc. However, pretraining is not widely used for 3D recognition
tasks where state-of-the-art methods train models from scratch. A primary
reason is the lack of large annotated datasets because 3D data is both
difficult to acquire and time consuming to label. We present a simple
self-supervised pertaining method that can work with any 3D data - single or
multiview, indoor or outdoor, acquired by varied sensors, without 3D
registration. We pretrain standard point cloud and voxel based model
architectures, and show that joint pretraining further improves performance. We
evaluate our models on 9 benchmarks for object detection, semantic
segmentation, and object classification, where they achieve state-of-the-art
results and can outperform supervised pretraining. We set a new
state-of-the-art for object detection on ScanNet (69.0% mAP) and SUNRGBD (63.5%
mAP). Our pretrained models are label efficient and improve performance for
classes with few examples.
Related papers
- Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection [52.66283064389691]
State-of-the-art 3D object detectors are often trained on massive labeled datasets.
Recent works demonstrate that self-supervised pre-training with unlabeled data can improve detection accuracy with limited labels.
We propose a shelf-supervised approach for generating zero-shot 3D bounding boxes from paired RGB and LiDAR data.
arXiv Detail & Related papers (2024-06-14T15:21:57Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Video Pretraining Advances 3D Deep Learning on Chest CT Tasks [63.879848037679224]
Pretraining on large natural image classification datasets has aided model development on data-scarce 2D medical tasks.
These 2D models have been surpassed by 3D models on 3D computer vision benchmarks.
We show video pretraining for 3D models can enable higher performance on smaller datasets for 3D medical tasks.
arXiv Detail & Related papers (2023-04-02T14:46:58Z) - Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud
Analysis [33.31864436614945]
We propose a novel pre-training method for 3D point cloud models.
Our pre-training is self-supervised by a local pixel/point level correspondence loss and a global image/point cloud level loss.
These improved models outperform existing state-of-the-art methods on various datasets and downstream tasks.
arXiv Detail & Related papers (2022-10-28T05:23:03Z) - Weakly Supervised 3D Object Detection from Lidar Point Cloud [182.67704224113862]
It is laborious to manually label point cloud data for training high-quality 3D object detectors.
This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes.
Using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves 85-95% the performance of current top-leading, fully supervised detectors.
arXiv Detail & Related papers (2020-07-23T10:12:46Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.