PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based
3D Object Detection
- URL: http://arxiv.org/abs/2311.17770v1
- Date: Wed, 29 Nov 2023 16:11:33 GMT
- Title: PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based
3D Object Detection
- Authors: Weixin Mao, Tiancai Wang, Diankun Zhang, Junjie Yan, Osamu Yoshie
- Abstract summary: We show the effectiveness of 2D backbone scaling and pretraining for pillar-based 3D object detectors.
Our proposed pillar-based detector, PillarNeSt, outperforms the existing 3D object detectors by a large margin on the nuScenes and Argoversev2 datasets.
- Score: 33.00510927880774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper shows the effectiveness of 2D backbone scaling and pretraining for
pillar-based 3D object detectors. Pillar-based methods mainly employ randomly
initialized 2D convolution neural network (ConvNet) for feature extraction and
fail to enjoy the benefits from the backbone scaling and pretraining in the
image domain. To show the scaling-up capacity in point clouds, we introduce the
dense ConvNet pretrained on large-scale image datasets (e.g., ImageNet) as the
2D backbone of pillar-based detectors. The ConvNets are adaptively designed
based on the model size according to the specific features of point clouds,
such as sparsity and irregularity. Equipped with the pretrained ConvNets, our
proposed pillar-based detector, termed PillarNeSt, outperforms the existing 3D
object detectors by a large margin on the nuScenes and Argoversev2 datasets.
Our code shall be released upon acceptance.
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene
Understanding [40.68012530554327]
We introduce a pretrained 3D backbone, called SST, for 3D indoor scene understanding.
We design a 3D Swin transformer as our backbone network, which enables efficient self-attention on sparse voxels with linear memory complexity.
A series of extensive ablation studies further validate the scalability, generality, and superior performance enabled by our approach.
arXiv Detail & Related papers (2023-04-14T02:49:08Z) - Pillar R-CNN for Point Cloud 3D Object Detection [4.169126928311421]
We devise a conceptually simple yet effective two-stage 3D detection architecture, named Pillar R-CNN.
Our Pillar R-CNN performs favorably against state-of-the-art 3D detectors on the large-scale Open dataset.
It should be highlighted that further exploration into BEV perception for applications involving autonomous driving is now possible thanks to the effective and elegant Pillar R-CNN architecture.
arXiv Detail & Related papers (2023-02-26T12:07:25Z) - PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained
Image-Language Models [56.324516906160234]
Generalizable 3D part segmentation is important but challenging in vision and robotics.
This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP.
We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm.
arXiv Detail & Related papers (2022-12-03T06:59:01Z) - PillarNet: Real-Time and High-Performance Pillar-based 3D Object
Detection [4.169126928311421]
Real-time and high-performance 3D object detection is of critical importance for autonomous driving.
Recent top-performing 3D object detectors mainly rely on point-based or 3D voxel-based convolutions.
We develop a real-time and high-performance pillar-based detector, dubbed PillarNet.
arXiv Detail & Related papers (2022-05-16T00:14:50Z) - RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds.
We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays.
Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z) - CG-SSD: Corner Guided Single Stage 3D Object Detection from LiDAR Point
Cloud [4.110053032708927]
In a real world scene, the LiDAR can only acquire a limited object surface point clouds, but the center point of the object does not exist.
We propose a corner-guided anchor-free single-stage 3D object detection model (CG-SSD)
CG-SSD achieves the state-of-art performance on the ONCE benchmark for supervised 3D object detection using single frame point cloud data.
arXiv Detail & Related papers (2022-02-24T02:30:15Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - ParaNet: Deep Regular Representation for 3D Point Clouds [62.81379889095186]
ParaNet is a novel end-to-end deep learning framework for representing 3D point clouds.
It converts an irregular 3D point cloud into a regular 2D color image, named point geometry image (PGI)
In contrast to conventional regular representation modalities based on multi-view projection and voxelization, the proposed representation is differentiable and reversible.
arXiv Detail & Related papers (2020-12-05T13:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.