Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving
- URL: http://arxiv.org/abs/2203.02112v1
- Date: Fri, 4 Mar 2022 03:00:34 GMT
- Title: Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving
- Authors: Yi-Nan Chen and Hang Dai and Yong Ding
- Abstract summary: The gap in image-to-image generation for stereo views is much smaller than that in image-to-LiDAR generation.
Motivated by this, we propose a Pseudo-Stereo 3D detection framework with three novel virtual view generation methods.
Our framework ranks 1st on car, pedestrian, and cyclist among the monocular 3D detectors with publications on the KITTI-3D benchmark.
- Score: 14.582107328849473
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pseudo-LiDAR 3D detectors have made remarkable progress in monocular 3D
detection by enhancing the capability of perceiving depth with depth estimation
networks, and using LiDAR-based 3D detection architectures. The advanced stereo
3D detectors can also accurately localize 3D objects. The gap in image-to-image
generation for stereo views is much smaller than that in image-to-LiDAR
generation. Motivated by this, we propose a Pseudo-Stereo 3D detection
framework with three novel virtual view generation methods, including
image-level generation, feature-level generation, and feature-clone, for
detecting 3D objects from a single image. Our analysis of depth-aware learning
shows that the depth loss is effective in only feature-level virtual view
generation and the estimated depth map is effective in both image-level and
feature-level in our framework. We propose a disparity-wise dynamic convolution
with dynamic kernels sampled from the disparity feature map to filter the
features adaptively from a single image for generating virtual image features,
which eases the feature degradation caused by the depth estimation errors. Till
submission (November 18, 2021), our Pseudo-Stereo 3D detection framework ranks
1st on car, pedestrian, and cyclist among the monocular 3D detectors with
publications on the KITTI-3D benchmark. The code is released at
https://github.com/revisitq/Pseudo-Stereo-3D.
Related papers
- Weakly Supervised Monocular 3D Detection with a Single-View Image [58.57978772009438]
Monocular 3D detection aims for precise 3D object localization from a single-view image.
We propose SKD-WM3D, a weakly supervised monocular 3D detection framework.
We show that SKD-WM3D surpasses the state-of-the-art clearly and is even on par with many fully supervised methods.
arXiv Detail & Related papers (2024-02-29T13:26:47Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object
Detection [17.526914782562528]
We propose Graph-DETR3D to automatically aggregate multi-view imagery information through graph structure learning (GSL)
Our best model achieves 49.5 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various published image-view 3D object detectors.
arXiv Detail & Related papers (2022-04-25T12:10:34Z) - Voxel-based 3D Detection and Reconstruction of Multiple Objects from a
Single Image [22.037472446683765]
We learn a regular grid of 3D voxel features from the input image which is aligned with 3D scene space via a 3D feature lifting operator.
Based on the 3D voxel features, our novel CenterNet-3D detection head formulates the 3D detection as keypoint detection in the 3D space.
We devise an efficient coarse-to-fine reconstruction module, including coarse-level voxelization and a novel local PCA-SDF shape representation.
arXiv Detail & Related papers (2021-11-04T18:30:37Z) - LIGA-Stereo: Learning LiDAR Geometry Aware Representations for
Stereo-based 3D Detector [80.7563981951707]
We propose LIGA-Stereo to learn stereo-based 3D detectors under the guidance of high-level geometry-aware representations of LiDAR-based detection models.
Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10.44%, 5.69%, 5.97% mAP respectively.
arXiv Detail & Related papers (2021-08-18T17:24:40Z) - YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection [6.5702792909006735]
YOLOStereo3D is trained on one single GPU and runs at more than ten fps.
It demonstrates performance comparable to state-of-the-art stereo 3D detection frameworks without usage of LiDAR data.
arXiv Detail & Related papers (2021-03-17T03:43:54Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.