To the Point: Efficient 3D Object Detection in the Range Image with
Graph Convolution Kernels
- URL: http://arxiv.org/abs/2106.13381v1
- Date: Fri, 25 Jun 2021 01:27:26 GMT
- Title: To the Point: Efficient 3D Object Detection in the Range Image with
Graph Convolution Kernels
- Authors: Yuning Chai, Pei Sun, Jiquan Ngiam, Weiyue Wang, Benjamin Caine, Vijay
Vasudevan, Xiao Zhang, Dragomir Anguelov
- Abstract summary: We design a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network.
Our method performs competitively on the Open dataset and improves the state-of-the-art AP for pedestrian detection from 69.7% to 75.5%.
It is also efficient in that our smallest model, which still outperforms the popular PointPillars in quality, requires 180 times fewer FLOPS and model parameters.
- Score: 30.3378171262436
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: 3D object detection is vital for many robotics applications. For tasks where
a 2D perspective range image exists, we propose to learn a 3D representation
directly from this range image view. To this end, we designed a 2D
convolutional network architecture that carries the 3D spherical coordinates of
each pixel throughout the network. Its layers can consume any arbitrary
convolution kernel in place of the default inner product kernel and exploit the
underlying local geometry around each pixel. We outline four such kernels: a
dense kernel according to the bag-of-words paradigm, and three graph kernels
inspired by recent graph neural network advances: the Transformer, the
PointNet, and the Edge Convolution. We also explore cross-modality fusion with
the camera image, facilitated by operating in the perspective range image view.
Our method performs competitively on the Waymo Open Dataset and improves the
state-of-the-art AP for pedestrian detection from 69.7% to 75.5%. It is also
efficient in that our smallest model, which still outperforms the popular
PointPillars in quality, requires 180 times fewer FLOPS and model parameters
Related papers
- Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic
Segmentation [3.5939555573102853]
Recent works on 3D semantic segmentation propose to exploit the synergy between images and point clouds by processing each modality with a dedicated network.
We propose an end-to-end trainable multi-view aggregation model leveraging the viewing conditions of 3D points to merge features from images taken at arbitrary positions.
Our method can combine standard 2D and 3D networks and outperforms both 3D models operating on colorized point clouds and hybrid 2D/3D networks.
arXiv Detail & Related papers (2022-04-15T17:10:48Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - Robust 2D/3D Vehicle Parsing in CVIS [54.825777404511605]
We present a novel approach to robustly detect and perceive vehicles in different camera views as part of a cooperative vehicle-infrastructure system (CVIS)
Our formulation is designed for arbitrary camera views and makes no assumptions about intrinsic or extrinsic parameters.
In practice, our approach outperforms SOTA methods on 2D detection, instance segmentation, and 6-DoF pose estimation.
arXiv Detail & Related papers (2021-03-11T03:35:05Z) - GRF: Learning a General Radiance Field for 3D Representation and
Rendering [4.709764624933227]
We present a simple yet powerful neural network that implicitly represents and renders 3D objects and scenes only from 2D observations.
The network models 3D geometries as a general radiance field, which takes a set of 2D images with camera poses and intrinsics as input.
Our method can generate high-quality and realistic novel views for novel objects, unseen categories and challenging real-world scenes.
arXiv Detail & Related papers (2020-10-09T14:21:43Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.