SVDM: Single-View Diffusion Model for Pseudo-Stereo 3D Object Detection
- URL: http://arxiv.org/abs/2307.02270v1
- Date: Wed, 5 Jul 2023 13:10:37 GMT
- Title: SVDM: Single-View Diffusion Model for Pseudo-Stereo 3D Object Detection
- Authors: Yuguang Shi
- Abstract summary: A recently proposed framework for monocular 3D detection based on Pseudo-Stereo has received considerable attention in the community.
In this work, we propose an end-to-end, efficient pseudo-stereo 3D detection framework by introducing a Single-View Diffusion Model.
SVDM allows the entire pseudo-stereo 3D detection pipeline to be trained end-to-end and can benefit from the training of stereo detectors.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key problems in 3D object detection is to reduce the accuracy gap
between methods based on LiDAR sensors and those based on monocular cameras. A
recently proposed framework for monocular 3D detection based on Pseudo-Stereo
has received considerable attention in the community. However, so far these two
problems are discovered in existing practices, including (1) monocular depth
estimation and Pseudo-Stereo detector must be trained separately, (2) Difficult
to be compatible with different stereo detectors and (3) the overall
calculation is large, which affects the reasoning speed. In this work, we
propose an end-to-end, efficient pseudo-stereo 3D detection framework by
introducing a Single-View Diffusion Model (SVDM) that uses a few iterations to
gradually deliver right informative pixels to the left image. SVDM allows the
entire pseudo-stereo 3D detection pipeline to be trained end-to-end and can
benefit from the training of stereo detectors. Afterwards, we further explore
the application of SVDM in depth-free stereo 3D detection, and the final
framework is compatible with most stereo detectors. Among multiple benchmarks
on the KITTI dataset, we achieve new state-of-the-art performance.
Related papers
- Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving [14.582107328849473]
The gap in image-to-image generation for stereo views is much smaller than that in image-to-LiDAR generation.
Motivated by this, we propose a Pseudo-Stereo 3D detection framework with three novel virtual view generation methods.
Our framework ranks 1st on car, pedestrian, and cyclist among the monocular 3D detectors with publications on the KITTI-3D benchmark.
arXiv Detail & Related papers (2022-03-04T03:00:34Z) - SGM3D: Stereo Guided Monocular 3D Object Detection [62.11858392862551]
We propose a stereo-guided monocular 3D object detection network, termed SGM3D.
We exploit robust 3D features extracted from stereo images to enhance the features learned from the monocular image.
Our method can be integrated into many other monocular approaches to boost performance without introducing any extra computational cost.
arXiv Detail & Related papers (2021-12-03T13:57:14Z) - LIGA-Stereo: Learning LiDAR Geometry Aware Representations for
Stereo-based 3D Detector [80.7563981951707]
We propose LIGA-Stereo to learn stereo-based 3D detectors under the guidance of high-level geometry-aware representations of LiDAR-based detection models.
Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10.44%, 5.69%, 5.97% mAP respectively.
arXiv Detail & Related papers (2021-08-18T17:24:40Z) - Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving.
In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Confidence Guided Stereo 3D Object Detection with Split Depth Estimation [10.64859537162938]
CG-Stereo is a confidence-guided stereo 3D object detection pipeline.
It uses separate decoders for foreground and background pixels during depth estimation.
Our approach outperforms all state-of-the-art stereo-based 3D detectors on the KITTI benchmark.
arXiv Detail & Related papers (2020-03-11T20:00:11Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.