S$^3$-MonoDETR: Supervised Shape&Scale-perceptive Deformable Transformer
for Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2309.00928v1
- Date: Sat, 2 Sep 2023 12:36:38 GMT
- Title: S$^3$-MonoDETR: Supervised Shape&Scale-perceptive Deformable Transformer
for Monocular 3D Object Detection
- Authors: Xuan He, Kailun Yang, Junwei Zheng, Jin Yuan, Luis M. Bergasa, Hui
Zhang, Zhiyong Li
- Abstract summary: "Supervised Shape&Scale-perceptive Deformable Attention" (S$3$-DA) module for monocular 3D object detection.
This paper proposes a novel "Supervised Shape&Scale-perceptive Deformable Attention" (S$3$-DA) module for monocular 3D object detection.
- Score: 22.424834025925076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, transformer-based methods have shown exceptional performance in
monocular 3D object detection, which can predict 3D attributes from a single 2D
image. These methods typically use visual and depth representations to generate
query points on objects, whose quality plays a decisive role in the detection
accuracy. However, current unsupervised attention mechanisms without any
geometry appearance awareness in transformers are susceptible to producing
noisy features for query points, which severely limits the network performance
and also makes the model have a poor ability to detect multi-category objects
in a single training process. To tackle this problem, this paper proposes a
novel "Supervised Shape&Scale-perceptive Deformable Attention" (S$^3$-DA)
module for monocular 3D object detection. Concretely, S$^3$-DA utilizes visual
and depth features to generate diverse local features with various shapes and
scales and predict the corresponding matching distribution simultaneously to
impose valuable shape&scale perception for each query. Benefiting from this,
S$^3$-DA effectively estimates receptive fields for query points belonging to
any category, enabling them to generate robust query features. Besides, we
propose a Multi-classification-based Shape$\&$Scale Matching (MSM) loss to
supervise the above process. Extensive experiments on KITTI and Waymo Open
datasets demonstrate that S$^3$-DA significantly improves the detection
accuracy, yielding state-of-the-art performance of single-category and
multi-category 3D object detection in a single training process compared to the
existing approaches. The source code will be made publicly available at
https://github.com/mikasa3lili/S3-MonoDETR.
Related papers
- UniMODE: Unified Monocular 3D Object Detection [70.27631528933482]
We build a detector based on the bird's-eye-view (BEV) detection paradigm.
We propose an uneven BEV grid design to handle the convergence instability caused by the challenges.
A unified detector UniMODE is derived, which surpasses the previous state-of-the-art on the challenging Omni3D dataset.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - SSD-MonoDETR: Supervised Scale-aware Deformable Transformer for
Monocular 3D Object Detection [28.575174815764566]
This paper proposes a novel "Supervised Scale-aware Deformable Attention" (SSDA) for monocular 3D object detection.
Imposing the scale awareness, SSDA could well predict the accurate receptive field of an object query.
SSDA significantly improves the detection accuracy, especially on moderate and hard objects.
arXiv Detail & Related papers (2023-05-12T06:17:57Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - IAFA: Instance-aware Feature Aggregation for 3D Object Detection from a
Single Image [37.83574424518901]
3D object detection from a single image is an important task in Autonomous Driving.
We propose an instance-aware approach to aggregate useful information for improving the accuracy of 3D object detection.
arXiv Detail & Related papers (2021-03-05T05:47:52Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.