RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency
Embedding Space for Autonomous Driving
- URL: http://arxiv.org/abs/2012.15072v1
- Date: Wed, 30 Dec 2020 07:56:37 GMT
- Title: RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency
Embedding Space for Autonomous Driving
- Authors: Peixuan Li, Shun Su, Huaici Zhao
- Abstract summary: We propose an efficient and accurate 3D object detection method from stereo images, named RTS3D.
Experiments on the KITTI benchmark show that RTS3D is the first true real-time system for stereo image 3D detection.
- Score: 3.222802562733787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although the recent image-based 3D object detection methods using
Pseudo-LiDAR representation have shown great capabilities, a notable gap in
efficiency and accuracy still exist compared with LiDAR-based methods. Besides,
over-reliance on the stand-alone depth estimator, requiring a large number of
pixel-wise annotations in the training stage and more computation in the
inferencing stage, limits the scaling application in the real world.
In this paper, we propose an efficient and accurate 3D object detection
method from stereo images, named RTS3D. Different from the 3D occupancy space
in the Pseudo-LiDAR similar methods, we design a novel 4D feature-consistent
embedding (FCE) space as the intermediate representation of the 3D scene
without depth supervision. The FCE space encodes the object's structural and
semantic information by exploring the multi-scale feature consistency warped
from stereo pair. Furthermore, a semantic-guided RBF (Radial Basis Function)
and a structure-aware attention module are devised to reduce the influence of
FCE space noise without instance mask supervision. Experiments on the KITTI
benchmark show that RTS3D is the first true real-time system (FPS$>$24) for
stereo image 3D detection meanwhile achieves $10\%$ improvement in average
precision comparing with the previous state-of-the-art method. The code will be
available at https://github.com/Banconxuan/RTS3D
Related papers
- EGFN: Efficient Geometry Feature Network for Fast Stereo 3D Object
Detection [51.52496693690059]
Fast stereo based 3D object detectors lag far behind high-precision oriented methods in accuracy.
We argue that the main reason is the missing or poor 3D geometry feature representation in fast stereo based methods.
The proposed EGFN outperforms YOLOStsereo3D, the advanced fast method, by 5.16% on mAP$_3d$ at the cost of merely additional 12 ms.
arXiv Detail & Related papers (2021-11-28T05:25:36Z) - LIGA-Stereo: Learning LiDAR Geometry Aware Representations for
Stereo-based 3D Detector [80.7563981951707]
We propose LIGA-Stereo to learn stereo-based 3D detectors under the guidance of high-level geometry-aware representations of LiDAR-based detection models.
Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10.44%, 5.69%, 5.97% mAP respectively.
arXiv Detail & Related papers (2021-08-18T17:24:40Z) - Shape Prior Non-Uniform Sampling Guided Real-time Stereo 3D Object
Detection [59.765645791588454]
Recently introduced RTS3D builds an efficient 4D Feature-Consistency Embedding space for the intermediate representation of object without depth supervision.
We propose a shape prior non-uniform sampling strategy that performs dense sampling in outer region and sparse sampling in inner region.
Our proposed method has 2.57% improvement on AP3d almost without extra network parameters.
arXiv Detail & Related papers (2021-06-18T09:14:55Z) - Stereo Object Matching Network [78.35697025102334]
This paper presents a stereo object matching method that exploits both 2D contextual information from images and 3D object-level information.
We present two novel strategies to handle 3D objectness in the cost volume space: selective sampling (RoISelect) and 2D-3D fusion.
arXiv Detail & Related papers (2021-03-23T12:54:43Z) - Stereo CenterNet based 3D Object Detection for Autonomous Driving [2.508414661327797]
We propose a 3D object detection method using geometric information in stereo images, called Stereo CenterNet.
Stereo CenterNet predicts the four semantic key points of the 3D bounding box of the object in space and uses 2D left right boxes, 3D dimension, orientation and key points to restore the bounding box of the object in the 3D space.
Experiments conducted on the KITTI dataset show that our method achieves the best speed-accuracy trade-off compared with the state-of-the-art methods based on stereo geometry.
arXiv Detail & Related papers (2021-03-20T02:18:49Z) - YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection [6.5702792909006735]
YOLOStereo3D is trained on one single GPU and runs at more than ten fps.
It demonstrates performance comparable to state-of-the-art stereo 3D detection frameworks without usage of LiDAR data.
arXiv Detail & Related papers (2021-03-17T03:43:54Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z) - RTM3D: Real-time Monocular 3D Detection from Object Keypoints for
Autonomous Driving [26.216609821525676]
Most successful 3D detectors take the projection constraint from the 3D bounding box to the 2D box as an important component.
Our method predicts the nine perspective keypoints of a 3D bounding box in image space, and then utilize the geometric relationship of 3D and 2D perspectives to recover the dimension, location, and orientation in 3D space.
Our method is the first real-time system for monocular image 3D detection while achieves state-of-the-art performance on the KITTI benchmark.
arXiv Detail & Related papers (2020-01-10T08:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.