PVNAS: 3D Neural Architecture Search with Point-Voxel Convolution
- URL: http://arxiv.org/abs/2204.11797v2
- Date: Tue, 26 Apr 2022 01:25:19 GMT
- Title: PVNAS: 3D Neural Architecture Search with Point-Voxel Convolution
- Authors: Zhijian Liu, Haotian Tang, Shengyu Zhao, Kevin Shao, Song Han
- Abstract summary: We study 3D deep learning from the efficiency perspective.
We propose a novel hardware-efficient 3D primitive, Point-Voxel Convolution (PVConv)
- Score: 26.059213743430192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D neural networks are widely used in real-world applications (e.g., AR/VR
headsets, self-driving cars). They are required to be fast and accurate;
however, limited hardware resources on edge devices make these requirements
rather challenging. Previous work processes 3D data using either voxel-based or
point-based neural networks, but both types of 3D models are not
hardware-efficient due to the large memory footprint and random memory access.
In this paper, we study 3D deep learning from the efficiency perspective. We
first systematically analyze the bottlenecks of previous 3D methods. We then
combine the best from point-based and voxel-based models together and propose a
novel hardware-efficient 3D primitive, Point-Voxel Convolution (PVConv). We
further enhance this primitive with the sparse convolution to make it more
effective in processing large (outdoor) scenes. Based on our designed 3D
primitive, we introduce 3D Neural Architecture Search (3D-NAS) to explore the
best 3D network architecture given a resource constraint. We evaluate our
proposed method on six representative benchmark datasets, achieving
state-of-the-art performance with 1.8-23.7x measured speedup. Furthermore, our
method has been deployed to the autonomous racing vehicle of MIT Driverless,
achieving larger detection range, higher accuracy and lower latency.
Related papers
- PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - FastPillars: A Deployment-friendly Pillar-based 3D Detector [63.0697065653061]
Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference.
FastPillars delivers state-of-the-art accuracy on Open dataset with 1.8X speed up and 3.8 mAPH/L2 improvement over CenterPoint (SPConv-based)
arXiv Detail & Related papers (2023-02-05T12:13:27Z) - Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild [32.05421669957098]
Large datasets and scalable solutions have led to unprecedented advances in 2D recognition.
We revisit the task of 3D object detection by introducing a large benchmark, called Omni3D.
We show that Cube R-CNN outperforms prior works on the larger Omni3D and existing benchmarks.
arXiv Detail & Related papers (2022-07-21T17:56:22Z) - EGFN: Efficient Geometry Feature Network for Fast Stereo 3D Object
Detection [51.52496693690059]
Fast stereo based 3D object detectors lag far behind high-precision oriented methods in accuracy.
We argue that the main reason is the missing or poor 3D geometry feature representation in fast stereo based methods.
The proposed EGFN outperforms YOLOStsereo3D, the advanced fast method, by 5.16% on mAP$_3d$ at the cost of merely additional 12 ms.
arXiv Detail & Related papers (2021-11-28T05:25:36Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Ground-aware Monocular 3D Object Detection for Autonomous Driving [6.5702792909006735]
Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a challenging task for low-cost urban autonomous driving and mobile robots.
Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation.
We introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning.
arXiv Detail & Related papers (2021-02-01T08:18:24Z) - Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion
Forecasting with a Single Convolutional Net [93.51773847125014]
We propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor.
Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world.
arXiv Detail & Related papers (2020-12-22T22:43:35Z) - Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors.
We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom.
We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z) - Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution [34.713667358316286]
Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely.
Existing 3D perception models are not able to recognize small instances very well due to the low-resolution voxelization and aggressive downsampling.
We propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch.
arXiv Detail & Related papers (2020-07-31T14:27:27Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.