Object as Query: Lifting any 2D Object Detector to 3D Detection
- URL: http://arxiv.org/abs/2301.02364v3
- Date: Mon, 6 Nov 2023 04:37:47 GMT
- Title: Object as Query: Lifting any 2D Object Detector to 3D Detection
- Authors: Zitian Wang, Zehao Huang, Jiahui Fu, Naiyan Wang, Si Liu
- Abstract summary: We design Multi-View 2D Objects guided 3D Object Detector (MV2D)
MV2D exploits 2D detectors to generate object queries conditioned on the rich image semantics.
For the generated queries, we design a sparse cross attention module to force them to focus on the features of specific objects.
- Score: 30.393111518104313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection from multi-view images has drawn much attention over the
past few years. Existing methods mainly establish 3D representations from
multi-view images and adopt a dense detection head for object detection, or
employ object queries distributed in 3D space to localize objects. In this
paper, we design Multi-View 2D Objects guided 3D Object Detector (MV2D), which
can lift any 2D object detector to multi-view 3D object detection. Since 2D
detections can provide valuable priors for object existence, MV2D exploits 2D
detectors to generate object queries conditioned on the rich image semantics.
These dynamically generated queries help MV2D to recall objects in the field of
view and show a strong capability of localizing 3D objects. For the generated
queries, we design a sparse cross attention module to force them to focus on
the features of specific objects, which suppresses interference from noises.
The evaluation results on the nuScenes dataset demonstrate the dynamic object
queries and sparse feature aggregation can promote 3D detection capability.
MV2D also exhibits a state-of-the-art performance among existing methods. We
hope MV2D can serve as a new baseline for future research. Code is available at
\url{https://github.com/tusen-ai/MV2D}.
Related papers
- SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras [3.648972014796591]
We present a single model termed SimPB, which simultaneously detects 2D objects in the perspective view and 3D objects in the BEV space from multiple cameras.
A hybrid decoder consists of several multi-view 2D decoder layers and several 3D decoder layers, specifically designed for their respective detection tasks.
arXiv Detail & Related papers (2024-03-15T14:39:39Z) - Improving Distant 3D Object Detection Using 2D Box Supervision [97.80225758259147]
We propose LR3D, a framework that learns to recover the missing depth of distant objects.
Our framework is general, and could widely benefit 3D detection methods to a large extent.
arXiv Detail & Related papers (2024-03-14T09:54:31Z) - Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors [6.3557174349423455]
We present a novel query generating approach termed QAF2D, which infers 3D query anchors from 2D detection results.
The largest improvement that QAF2D can bring about on the nuScenes validation subset is $2.3%$ NDS and $2.7%$ mAP.
arXiv Detail & Related papers (2024-03-10T04:38:27Z) - Object2Scene: Putting Objects in Context for Open-Vocabulary 3D
Detection [24.871590175483096]
Point cloud-based open-vocabulary 3D object detection aims to detect 3D categories that do not have ground-truth annotations in the training set.
Previous approaches leverage large-scale richly-annotated image datasets as a bridge between 3D and category semantics.
We propose Object2Scene, the first approach that leverages large-scale large-vocabulary 3D object datasets to augment existing 3D scene datasets for open-vocabulary 3D object detection.
arXiv Detail & Related papers (2023-09-18T03:31:53Z) - Tracking Objects with 3D Representation from Videos [57.641129788552675]
We propose a new 2D Multiple Object Tracking paradigm, called P3DTrack.
With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack.
arXiv Detail & Related papers (2023-06-08T17:58:45Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Multi-Task Multi-Sensor Fusion for 3D Object Detection [93.68864606959251]
We present an end-to-end learnable architecture that reasons about 2D and 3D object detection as well as ground estimation and depth completion.
Our experiments show that all these tasks are complementary and help the network learn better representations by fusing information at various levels.
arXiv Detail & Related papers (2020-12-22T22:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.