Attention-Based Depth Distillation with 3D-Aware Positional Encoding for
Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2211.16779v2
- Date: Mon, 3 Jul 2023 08:07:45 GMT
- Title: Attention-Based Depth Distillation with 3D-Aware Positional Encoding for
Monocular 3D Object Detection
- Authors: Zizhang Wu, Yunzhe Wu, Jian Pu, Xianzhi Li and Xiaoquan Wang
- Abstract summary: ADD is an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding.
Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth.
We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost.
- Score: 10.84784828447741
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D object detection is a low-cost but challenging task, as it
requires generating accurate 3D localization solely from a single image input.
Recent developed depth-assisted methods show promising results by using
explicit depth maps as intermediate features, which are either precomputed by
monocular depth estimation networks or jointly evaluated with 3D object
detection. However, inevitable errors from estimated depth priors may lead to
misaligned semantic information and 3D localization, hence resulting in feature
smearing and suboptimal predictions. To mitigate this issue, we propose ADD, an
Attention-based Depth knowledge Distillation framework with 3D-aware positional
encoding. Unlike previous knowledge distillation frameworks that adopt stereo-
or LiDAR-based teachers, we build up our teacher with identical architecture as
the student but with extra ground-truth depth as input. Credit to our teacher
design, our framework is seamless, domain-gap free, easily implementable, and
is compatible with object-wise ground-truth depth. Specifically, we leverage
intermediate features and responses for knowledge distillation. Considering
long-range 3D dependencies, we propose \emph{3D-aware self-attention} and
\emph{target-aware cross-attention} modules for student adaptation. Extensive
experiments are performed to verify the effectiveness of our framework on the
challenging KITTI 3D object detection benchmark. We implement our framework on
three representative monocular detectors, and we achieve state-of-the-art
performance with no additional inference computational cost relative to
baseline models. Our code is available at https://github.com/rockywind/ADD.
Related papers
- MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection [42.4932760909941]
Monocular 3D object detection is an indispensable research topic in autonomous driving.
The challenges of Mono3D lie in understanding 3D scene geometry and reconstructing 3D object information from a single image.
Previous methods attempted to transfer 3D information directly from the LiDAR-based teacher to the camera-based student.
arXiv Detail & Related papers (2024-04-07T10:39:04Z) - MonoCD: Monocular 3D Object Detection with Complementary Depths [9.186673054867866]
Depth estimation is an essential but challenging subtask of monocular 3D object detection.
We propose to increase the complementarity of depths with two novel designs.
Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data.
arXiv Detail & Related papers (2024-04-04T03:30:49Z) - Learning Occupancy for Monocular 3D Object Detection [25.56336546513198]
We propose textbfOccupancyM3D, a method of learning occupancy for monocular 3D detection.
It directly learns occupancy in frustum and 3D space, leading to more discriminative and informative 3D features and representations.
Experiments on KITTI and open datasets demonstrate that the proposed method achieves a new state of the art and surpasses other methods by a significant margin.
arXiv Detail & Related papers (2023-05-25T04:03:46Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection [61.89277940084792]
We introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR.
We formulate 3D object candidates as learnable queries and propose a depth-guided decoder to conduct object-scene depth interactions.
On KITTI benchmark with monocular images as input, MonoDETR achieves state-of-the-art performance and requires no extra dense depth annotations.
arXiv Detail & Related papers (2022-03-24T19:28:54Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Ground-aware Monocular 3D Object Detection for Autonomous Driving [6.5702792909006735]
Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a challenging task for low-cost urban autonomous driving and mobile robots.
Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation.
We introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning.
arXiv Detail & Related papers (2021-02-01T08:18:24Z) - Weakly Supervised 3D Object Detection from Point Clouds [27.70180601788613]
3D object detection aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.
Existing 3D object detectors rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios.
We propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training.
arXiv Detail & Related papers (2020-07-28T03:30:11Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.