EGFN: Efficient Geometry Feature Network for Fast Stereo 3D Object
Detection
- URL: http://arxiv.org/abs/2111.14055v1
- Date: Sun, 28 Nov 2021 05:25:36 GMT
- Title: EGFN: Efficient Geometry Feature Network for Fast Stereo 3D Object
Detection
- Authors: Aqi Gao, Yanwei Pang, Jing Nie, Jiale Cao and Yishun Guo
- Abstract summary: Fast stereo based 3D object detectors lag far behind high-precision oriented methods in accuracy.
We argue that the main reason is the missing or poor 3D geometry feature representation in fast stereo based methods.
The proposed EGFN outperforms YOLOStsereo3D, the advanced fast method, by 5.16% on mAP$_3d$ at the cost of merely additional 12 ms.
- Score: 51.52496693690059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fast stereo based 3D object detectors have made great progress in the sense
of inference time recently. However, they lag far behind high-precision
oriented methods in accuracy. We argue that the main reason is the missing or
poor 3D geometry feature representation in fast stereo based methods. To solve
this problem, we propose an efficient geometry feature generation network
(EGFN). The key of our EGFN is an efficient and effective 3D geometry feature
representation (EGFR) module. In the EGFR module, light-weight cost volume
features are firstly generated, then are efficiently converted into 3D space,
and finally multi-scale features enhancement in in both image and 3D spaces is
conducted to obtain the 3D geometry features: enhanced light-weight voxel
features. In addition, we introduce a novel multi-scale knowledge distillation
strategy to guide multi-scale 3D geometry features learning. Experimental
results on the public KITTI test set shows that the proposed EGFN outperforms
YOLOStsereo3D, the advanced fast method, by 5.16\% on mAP$_{3d}$ at the cost of
merely additional 12 ms and hence achieves a better trade-off between accuracy
and efficiency for stereo 3D object detection. Our code will be publicly
available.
Related papers
- Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding [83.63231467746598]
We introduce Any2Point, a parameter-efficient method to empower any-modality large models (vision, language, audio) for 3D understanding.
We propose a 3D-to-any (1D or 2D) virtual projection strategy that correlates the input 3D points to the original 1D or 2D positions within the source modality.
arXiv Detail & Related papers (2024-04-11T17:59:45Z) - NeRF-Det: Learning Geometry-Aware Volumetric Representation for
Multi-View 3D Object Detection [65.02633277884911]
We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input.
Our method makes use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance.
arXiv Detail & Related papers (2023-07-27T04:36:16Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - Voxel-based 3D Detection and Reconstruction of Multiple Objects from a
Single Image [22.037472446683765]
We learn a regular grid of 3D voxel features from the input image which is aligned with 3D scene space via a 3D feature lifting operator.
Based on the 3D voxel features, our novel CenterNet-3D detection head formulates the 3D detection as keypoint detection in the 3D space.
We devise an efficient coarse-to-fine reconstruction module, including coarse-level voxelization and a novel local PCA-SDF shape representation.
arXiv Detail & Related papers (2021-11-04T18:30:37Z) - Shape Prior Non-Uniform Sampling Guided Real-time Stereo 3D Object
Detection [59.765645791588454]
Recently introduced RTS3D builds an efficient 4D Feature-Consistency Embedding space for the intermediate representation of object without depth supervision.
We propose a shape prior non-uniform sampling strategy that performs dense sampling in outer region and sparse sampling in inner region.
Our proposed method has 2.57% improvement on AP3d almost without extra network parameters.
arXiv Detail & Related papers (2021-06-18T09:14:55Z) - HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object
Detection [39.64891219500416]
3D object detection methods exploit either voxel-based or point-based features to represent 3D objects in a scene.
We introduce in this paper a novel single-stage 3D detection method having the merit of both voxel-based and point-based features.
arXiv Detail & Related papers (2021-04-02T06:34:49Z) - RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency
Embedding Space for Autonomous Driving [3.222802562733787]
We propose an efficient and accurate 3D object detection method from stereo images, named RTS3D.
Experiments on the KITTI benchmark show that RTS3D is the first true real-time system for stereo image 3D detection.
arXiv Detail & Related papers (2020-12-30T07:56:37Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z) - RTM3D: Real-time Monocular 3D Detection from Object Keypoints for
Autonomous Driving [26.216609821525676]
Most successful 3D detectors take the projection constraint from the 3D bounding box to the 2D box as an important component.
Our method predicts the nine perspective keypoints of a 3D bounding box in image space, and then utilize the geometric relationship of 3D and 2D perspectives to recover the dimension, location, and orientation in 3D space.
Our method is the first real-time system for monocular image 3D detection while achieves state-of-the-art performance on the KITTI benchmark.
arXiv Detail & Related papers (2020-01-10T08:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.