ADU-Depth: Attention-based Distillation with Uncertainty Modeling for
Depth Estimation
- URL: http://arxiv.org/abs/2309.14744v1
- Date: Tue, 26 Sep 2023 08:12:37 GMT
- Title: ADU-Depth: Attention-based Distillation with Uncertainty Modeling for
Depth Estimation
- Authors: Zizhang Wu, Zhuozheng Li, Zhi-Gang Fan, Yunzhe Wu, Xiaoquan Wang, Rui
Tang, Jian Pu
- Abstract summary: We introduce spatial cues by training a teacher network that leverages left-right image pairs as inputs.
We apply both attention-adapted feature distillation and focal-depth-adapted response distillation in the training stage.
Our experiments on the real depth estimation datasets KITTI and DrivingStereo demonstrate the effectiveness of the proposed method.
- Score: 11.92011909884167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation is challenging due to its inherent ambiguity and
ill-posed nature, yet it is quite important to many applications. While recent
works achieve limited accuracy by designing increasingly complicated networks
to extract features with limited spatial geometric cues from a single RGB
image, we intend to introduce spatial cues by training a teacher network that
leverages left-right image pairs as inputs and transferring the learned 3D
geometry-aware knowledge to the monocular student network. Specifically, we
present a novel knowledge distillation framework, named ADU-Depth, with the
goal of leveraging the well-trained teacher network to guide the learning of
the student network, thus boosting the precise depth estimation with the help
of extra spatial scene information. To enable domain adaptation and ensure
effective and smooth knowledge transfer from teacher to student, we apply both
attention-adapted feature distillation and focal-depth-adapted response
distillation in the training stage. In addition, we explicitly model the
uncertainty of depth estimation to guide distillation in both feature space and
result space to better produce 3D-aware knowledge from monocular observations
and thus enhance the learning for hard-to-predict image regions. Our extensive
experiments on the real depth estimation datasets KITTI and DrivingStereo
demonstrate the effectiveness of the proposed method, which ranked 1st on the
challenging KITTI online benchmark.
Related papers
- 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - 2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic
Segmentation [92.17700318483745]
We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network.
IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points.
arXiv Detail & Related papers (2023-11-27T07:57:29Z) - ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D
Object Detection [15.204935788297226]
ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training.
By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points.
Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised.
arXiv Detail & Related papers (2023-10-28T07:12:09Z) - Sparse Depth-Guided Attention for Accurate Depth Completion: A
Stereo-Assisted Monitored Distillation Approach [7.902840502973506]
We introduce a stereo-based model as a teacher model to improve the accuracy of the student model for depth completion.
To provide self-supervised information, we also employ multi-view depth consistency and multi-scale minimum reprojection.
arXiv Detail & Related papers (2023-03-28T09:23:19Z) - Attention-Based Depth Distillation with 3D-Aware Positional Encoding for
Monocular 3D Object Detection [10.84784828447741]
ADD is an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding.
Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth.
We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost.
arXiv Detail & Related papers (2022-11-30T06:39:25Z) - Impact of a DCT-driven Loss in Attention-based Knowledge-Distillation
for Scene Recognition [64.29650787243443]
We propose and analyse the use of a 2D frequency transform of the activation maps before transferring them.
This strategy enhances knowledge transferability in tasks such as scene recognition.
We publicly release the training and evaluation framework used along this paper at http://www.vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition.
arXiv Detail & Related papers (2022-05-04T11:05:18Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - Geometry Uncertainty Projection Network for Monocular 3D Object
Detection [138.24798140338095]
We propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth.
At the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification.
arXiv Detail & Related papers (2021-07-29T06:59:07Z) - Self-Guided Instance-Aware Network for Depth Completion and Enhancement [6.319531161477912]
Existing methods directly interpolate the missing depth measurements based on pixel-wise image content and the corresponding neighboring depth values.
We propose a novel self-guided instance-aware network (SG-IANet) that utilize self-guided mechanism to extract instance-level features that is needed for depth restoration.
arXiv Detail & Related papers (2021-05-25T19:41:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.