ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection
- URL: http://arxiv.org/abs/2003.00529v1
- Date: Sun, 1 Mar 2020 17:18:08 GMT
- Title: ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection
- Authors: Zhenbo Xu, Wei Zhang, Xiaoqing Ye, Xiao Tan, Wei Yang, Shilei Wen,
Errui Ding, Ajin Meng, Liusheng Huang
- Abstract summary: We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
- Score: 69.68263074432224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection is an essential task in autonomous driving and robotics.
Though great progress has been made, challenges remain in estimating 3D pose
for distant and occluded objects. In this paper, we present a novel framework
named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet
begins with an ordinary 2D object detection model which is used to obtain pairs
of left-right bounding boxes. To further exploit the abundant texture cues in
RGB images for more accurate disparity estimation, we introduce a conceptually
straight-forward module -- adaptive zooming, which simultaneously resizes 2D
instance bounding boxes to a unified resolution and adjusts the camera
intrinsic parameters accordingly. In this way, we are able to estimate
higher-quality disparity maps from the resized box images then construct dense
point clouds for both nearby and distant objects. Moreover, we introduce to
learn part locations as complementary features to improve the resistance
against occlusion and put forward the 3D fitting score to better estimate the
3D detection quality. Extensive experiments on the popular KITTI 3D detection
dataset indicate ZoomNet surpasses all previous state-of-the-art methods by
large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation
study also demonstrates that our adaptive zooming strategy brings an
improvement of over 10% on AP3d (IoU=0.7). In addition, since the official
KITTI benchmark lacks fine-grained annotations like pixel-wise part locations,
we also present our KFG dataset by augmenting KITTI with detailed instance-wise
annotations including pixel-wise part location, pixel-wise disparity, etc..
Both the KFG dataset and our codes will be publicly available at
https://github.com/detectRecog/ZoomNet.
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [43.02373021724797]
We introduce a framework for multi-camera 3D object detection.
Our method manipulates predictions directly in 3D space.
We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.
arXiv Detail & Related papers (2021-10-13T17:59:35Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors.
We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom.
We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.