Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
- URL: http://arxiv.org/abs/2403.06093v1
- Date: Sun, 10 Mar 2024 04:38:27 GMT
- Title: Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
- Authors: Haoxuanye Ji, Pengpeng Liang, Erkang Cheng
- Abstract summary: We present a novel query generating approach termed QAF2D, which infers 3D query anchors from 2D detection results.
The largest improvement that QAF2D can bring about on the nuScenes validation subset is $2.3%$ NDS and $2.7%$ mAP.
- Score: 6.3557174349423455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-camera-based 3D object detection has made notable progress in the past
several years. However, we observe that there are cases (e.g. faraway regions)
in which popular 2D object detectors are more reliable than state-of-the-art 3D
detectors. In this paper, to improve the performance of query-based 3D object
detectors, we present a novel query generating approach termed QAF2D, which
infers 3D query anchors from 2D detection results. A 2D bounding box of an
object in an image is lifted to a set of 3D anchors by associating each sampled
point within the box with depth, yaw angle, and size candidates. Then, the
validity of each 3D anchor is verified by comparing its projection in the image
with its corresponding 2D box, and only valid anchors are kept and used to
construct queries. The class information of the 2D bounding box associated with
each query is also utilized to match the predicted boxes with ground truth for
the set-based loss. The image feature extraction backbone is shared between the
3D detector and 2D detector by adding a small number of prompt parameters. We
integrate QAF2D into three popular query-based 3D object detectors and carry
out comprehensive evaluations on the nuScenes dataset. The largest improvement
that QAF2D can bring about on the nuScenes validation subset is $2.3\%$ NDS and
$2.7\%$ mAP. Code is available at https://github.com/nullmax-vision/QAF2D.
Related papers
- General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - Roadside Monocular 3D Detection via 2D Detection Prompting [11.511202614683388]
We present a novel and simple method by prompting the 3D detector using 2D detections.
Our method builds on a key insight that, compared with 3D detectors, a 2D detector is much easier to train and performs significantly better w.r.t detections on the 2D image plane.
arXiv Detail & Related papers (2024-04-01T11:57:34Z) - SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras [3.648972014796591]
We present a single model termed SimPB, which simultaneously detects 2D objects in the perspective view and 3D objects in the BEV space from multiple cameras.
A hybrid decoder consists of several multi-view 2D decoder layers and several 3D decoder layers, specifically designed for their respective detection tasks.
arXiv Detail & Related papers (2024-03-15T14:39:39Z) - Improving Distant 3D Object Detection Using 2D Box Supervision [97.80225758259147]
We propose LR3D, a framework that learns to recover the missing depth of distant objects.
Our framework is general, and could widely benefit 3D detection methods to a large extent.
arXiv Detail & Related papers (2024-03-14T09:54:31Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - Recursive Cross-View: Use Only 2D Detectors to Achieve 3D Object
Detection without 3D Annotations [0.5439020425819]
We propose a method that does not demand any 3D annotations, while being able to predict fully oriented 3D bounding boxes.
Our method, called Recursive Cross-View (RCV), utilizes the three-view principle to convert 3D detection into multiple 2D detection tasks.
RCV is the first 3D detection method that yields fully oriented 3D boxes without consuming 3D labels.
arXiv Detail & Related papers (2022-11-14T04:51:05Z) - DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [43.02373021724797]
We introduce a framework for multi-camera 3D object detection.
Our method manipulates predictions directly in 3D space.
We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.
arXiv Detail & Related papers (2021-10-13T17:59:35Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.