Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object
Detection
- URL: http://arxiv.org/abs/2204.11582v2
- Date: Tue, 26 Apr 2022 09:52:18 GMT
- Title: Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object
Detection
- Authors: Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang,
Feng Zhao
- Abstract summary: We propose Graph-DETR3D to automatically aggregate multi-view imagery information through graph structure learning (GSL)
Our best model achieves 49.5 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various published image-view 3D object detectors.
- Score: 17.526914782562528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D object detection from multiple image views is a fundamental and
challenging task for visual scene understanding. Due to its low cost and high
efficiency, multi-view 3D object detection has demonstrated promising
application prospects. However, accurately detecting objects through
perspective views in the 3D space is extremely difficult due to the lack of
depth information. Recently, DETR3D introduces a novel 3D-2D query paradigm in
aggregating multi-view images for 3D object detection and achieves
state-of-the-art performance. In this paper, with intensive pilot experiments,
we quantify the objects located at different regions and find that the
"truncated instances" (i.e., at the border regions of each image) are the main
bottleneck hindering the performance of DETR3D. Although it merges multiple
features from two adjacent views in the overlapping regions, DETR3D still
suffers from insufficient feature aggregation, thus missing the chance to fully
boost the detection performance. In an effort to tackle the problem, we propose
Graph-DETR3D to automatically aggregate multi-view imagery information through
graph structure learning (GSL). It constructs a dynamic 3D graph between each
object query and 2D feature maps to enhance the object representations,
especially at the border regions. Besides, Graph-DETR3D benefits from a novel
depth-invariant multi-scale training strategy, which maintains the visual depth
consistency by simultaneously scaling the image size and the object depth.
Extensive experiments on the nuScenes dataset demonstrate the effectiveness and
efficiency of our Graph-DETR3D. Notably, our best model achieves 49.5 NDS on
the nuScenes test leaderboard, achieving new state-of-the-art in comparison
with various published image-view 3D object detectors.
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Generating Visual Spatial Description via Holistic 3D Scene
Understanding [88.99773815159345]
Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images.
With an external 3D scene extractor, we obtain the 3D objects and scene features for input images.
We construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.
arXiv Detail & Related papers (2023-05-19T15:53:56Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - Scatter Points in Space: 3D Detection from Multi-view Monocular Images [8.71944437852952]
3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision.
Recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space.
We propose a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity.
arXiv Detail & Related papers (2022-08-31T09:38:05Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving [14.582107328849473]
The gap in image-to-image generation for stereo views is much smaller than that in image-to-LiDAR generation.
Motivated by this, we propose a Pseudo-Stereo 3D detection framework with three novel virtual view generation methods.
Our framework ranks 1st on car, pedestrian, and cyclist among the monocular 3D detectors with publications on the KITTI-3D benchmark.
arXiv Detail & Related papers (2022-03-04T03:00:34Z) - Voxel-based 3D Detection and Reconstruction of Multiple Objects from a
Single Image [22.037472446683765]
We learn a regular grid of 3D voxel features from the input image which is aligned with 3D scene space via a 3D feature lifting operator.
Based on the 3D voxel features, our novel CenterNet-3D detection head formulates the 3D detection as keypoint detection in the 3D space.
We devise an efficient coarse-to-fine reconstruction module, including coarse-level voxelization and a novel local PCA-SDF shape representation.
arXiv Detail & Related papers (2021-11-04T18:30:37Z) - CoCoNets: Continuous Contrastive 3D Scene Representations [21.906643302668716]
This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos.
We show the resulting 3D visual feature representations effectively scale across objects and scenes, imagine information occluded or missing from the input viewpoints, track objects over time, align semantically related objects in 3D, and improve 3D object detection.
arXiv Detail & Related papers (2021-04-08T15:50:47Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.