SIDE: Center-based Stereo 3D Detector with Structure-aware Instance
Depth Estimation
- URL: http://arxiv.org/abs/2108.09663v2
- Date: Tue, 24 Aug 2021 08:16:19 GMT
- Title: SIDE: Center-based Stereo 3D Detector with Structure-aware Instance
Depth Estimation
- Authors: Xidong Peng, Xinge Zhu, Tai Wang, and Yuexin Ma
- Abstract summary: We propose a stereo-image based anchor-free 3D detection method, called structure-aware stereo 3D detector (termed as SIDE)
We explore the instance-level depth information via constructing the cost volume from RoIs of each object.
Our method achieves the state-of-the-art performance compared to existing methods without depth map supervision.
- Score: 11.169586369931803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D detection plays an indispensable role in environment perception. Due to
the high cost of commonly used LiDAR sensor, stereo vision based 3D detection,
as an economical yet effective setting, attracts more attention recently. For
these approaches based on 2D images, accurate depth information is the key to
achieve 3D detection, and most existing methods resort to a preliminary stage
for depth estimation. They mainly focus on the global depth and neglect the
property of depth information in this specific task, namely, sparsity and
locality, where exactly accurate depth is only needed for these 3D bounding
boxes. Motivated by this finding, we propose a stereo-image based anchor-free
3D detection method, called structure-aware stereo 3D detector (termed as
SIDE), where we explore the instance-level depth information via constructing
the cost volume from RoIs of each object. Due to the information sparsity of
local cost volume, we further introduce match reweighting and structure-aware
attention, to make the depth information more concentrated. Experiments
conducted on the KITTI dataset show that our method achieves the
state-of-the-art performance compared to existing methods without depth map
supervision.
Related papers
- OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection [102.0744303467713]
We propose a new multi-view 3D object detector named OPEN.
Our main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding.
OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.
arXiv Detail & Related papers (2024-07-15T14:29:15Z) - MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts [6.639648061168067]
We propose MonoPGC, a novel end-to-end Monocular 3D object detection framework with rich Pixel Geometry Contexts.
We introduce the pixel depth estimation as our auxiliary task and design depth cross-attention pyramid module (DCPM) to inject local and global depth geometry knowledge into visual features.
In addition, we present the depth-space-aware transformer (DSAT) to integrate 3D space position and depth-aware features efficiently.
arXiv Detail & Related papers (2023-02-21T09:21:58Z) - Attention-Based Depth Distillation with 3D-Aware Positional Encoding for
Monocular 3D Object Detection [10.84784828447741]
ADD is an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding.
Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth.
We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost.
arXiv Detail & Related papers (2022-11-30T06:39:25Z) - Boosting Monocular 3D Object Detection with Object-Centric Auxiliary
Depth Supervision [13.593246617391266]
We propose a method to boost the RGB image-based 3D detector by jointly training the detection network with a depth prediction loss analogous to the depth estimation task.
Our novel object-centric depth prediction loss focuses on depth around foreground objects, which is important for 3D object detection.
Our depth regression model is further trained to predict the uncertainty of depth to represent the 3D confidence of objects.
arXiv Detail & Related papers (2022-10-29T11:32:28Z) - MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection [61.89277940084792]
We introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR.
We formulate 3D object candidates as learnable queries and propose a depth-guided decoder to conduct object-scene depth interactions.
On KITTI benchmark with monocular images as input, MonoDETR achieves state-of-the-art performance and requires no extra dense depth annotations.
arXiv Detail & Related papers (2022-03-24T19:28:54Z) - Self-Supervised Depth Completion for Active Stereo [55.79929735390945]
Active stereo systems are widely used in the robotics industry due to their low cost and high quality depth maps.
These depth sensors suffer from stereo artefacts and do not provide dense depth estimates.
We present the first self-supervised depth completion method for active stereo systems that predicts accurate dense depth maps.
arXiv Detail & Related papers (2021-10-07T07:33:52Z) - Shape Prior Non-Uniform Sampling Guided Real-time Stereo 3D Object
Detection [59.765645791588454]
Recently introduced RTS3D builds an efficient 4D Feature-Consistency Embedding space for the intermediate representation of object without depth supervision.
We propose a shape prior non-uniform sampling strategy that performs dense sampling in outer region and sparse sampling in inner region.
Our proposed method has 2.57% improvement on AP3d almost without extra network parameters.
arXiv Detail & Related papers (2021-06-18T09:14:55Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Confidence Guided Stereo 3D Object Detection with Split Depth Estimation [10.64859537162938]
CG-Stereo is a confidence-guided stereo 3D object detection pipeline.
It uses separate decoders for foreground and background pixels during depth estimation.
Our approach outperforms all state-of-the-art stereo-based 3D detectors on the KITTI benchmark.
arXiv Detail & Related papers (2020-03-11T20:00:11Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.