DSGN++: Exploiting Visual-Spatial Relation forStereo-based 3D Detectors
- URL: http://arxiv.org/abs/2204.03039v1
- Date: Wed, 6 Apr 2022 18:43:54 GMT
- Title: DSGN++: Exploiting Visual-Spatial Relation forStereo-based 3D Detectors
- Authors: Yilun Chen, Shijia Huang, Shu Liu, Bei Yu, Jiaya Jia
- Abstract summary: Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors.
We revisit the prior stereo modeling DSGN about the stereo volume constructions for representing both 3D geometry and semantics.
We propose our approach, DSGN++, aiming for improving information flow throughout the 2D-to-3D pipeline.
- Score: 60.88824519770208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera-based 3D object detectors are welcome due to their wider deployment
and lower price than LiDAR sensors. We revisit the prior stereo modeling DSGN
about the stereo volume constructions for representing both 3D geometry and
semantics. We polish the stereo modeling and propose our approach, DSGN++,
aiming for improving information flow throughout the 2D-to-3D pipeline in the
following three main aspects. First, to effectively lift the 2D information to
stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser
connections and extracts depth-guided features. Second, for better grasping
differently spaced features, we present a novel stereo volume -- Dual-view
Stereo Volume (DSV) that integrates front-view and top-view features and
reconstructs sub-voxel depth in the camera frustum. Third, as the foreground
region becomes less dominant in 3D space, we firstly propose a multi-modal data
editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal
alignment and improves data efficiency. Without bells and whistles, extensive
experiments in various modality setups on the popular KITTI benchmark show that
our method consistently outperforms other camera-based 3D detectors for all
categories. Code will be released at https://github.com/chenyilun95/DSGN2.
Related papers
- Regulating Intermediate 3D Features for Vision-Centric Autonomous
Driving [26.03800936700545]
We propose to regulate intermediate dense 3D features with the help of volume rendering.
Experimental results on the Occ3D and nuScenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features.
arXiv Detail & Related papers (2023-12-19T04:09:05Z) - Unsupervised Multi-view Pedestrian Detection [12.882317991955228]
We propose an Unsupervised Multi-view Pedestrian Detection approach (UMPD) to eliminate the need of annotations to learn a multi-view pedestrian detector via 2D-3D mapping.
SIS is proposed to extract unsupervised representations of multi-view images, which are converted into 2D pedestrian masks as pseudo labels.
GVD encodes multi-view 2D images into a 3D volume to predict voxel-wise density and color via 2D-to-3D geometric projection, trained by 3D-to-2D mapping.
arXiv Detail & Related papers (2023-05-21T13:27:02Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - LIGA-Stereo: Learning LiDAR Geometry Aware Representations for
Stereo-based 3D Detector [80.7563981951707]
We propose LIGA-Stereo to learn stereo-based 3D detectors under the guidance of high-level geometry-aware representations of LiDAR-based detection models.
Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10.44%, 5.69%, 5.97% mAP respectively.
arXiv Detail & Related papers (2021-08-18T17:24:40Z) - Stereo Object Matching Network [78.35697025102334]
This paper presents a stereo object matching method that exploits both 2D contextual information from images and 3D object-level information.
We present two novel strategies to handle 3D objectness in the cost volume space: selective sampling (RoISelect) and 2D-3D fusion.
arXiv Detail & Related papers (2021-03-23T12:54:43Z) - YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection [6.5702792909006735]
YOLOStereo3D is trained on one single GPU and runs at more than ten fps.
It demonstrates performance comparable to state-of-the-art stereo 3D detection frameworks without usage of LiDAR data.
arXiv Detail & Related papers (2021-03-17T03:43:54Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.