Monitored Distillation for Positive Congruent Depth Completion
- URL: http://arxiv.org/abs/2203.16034v1
- Date: Wed, 30 Mar 2022 03:35:56 GMT
- Title: Monitored Distillation for Positive Congruent Depth Completion
- Authors: Tian Yu Liu and Parth Agrawal and Allison Chen and Byung-Woo Hong and
Alex Wong
- Abstract summary: We propose a method to infer a dense depth map from a single image, its calibration, and the associated sparse point cloud.
In order to leverage existing models that produce putative depth maps (teacher models), we propose an adaptive knowledge distillation approach.
We consider the scenario of a blind ensemble where we do not have access to ground truth for model selection nor training.
- Score: 13.050141729551585
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a method to infer a dense depth map from a single image, its
calibration, and the associated sparse point cloud. In order to leverage
existing models that produce putative depth maps (teacher models), we propose
an adaptive knowledge distillation approach that yields a positive congruent
training process, where a student model avoids learning the error modes of the
teachers. We consider the scenario of a blind ensemble where we do not have
access to ground truth for model selection nor training. The crux of our
method, termed Monitored Distillation, lies in a validation criterion that
allows us to learn from teachers by choosing predictions that best minimize the
photometric reprojection error for a given image. The result of which is a
distilled depth map and a confidence map, or "monitor", for how well a
prediction from a particular teacher fits the observed image. The monitor
adaptively weights the distilled depth where, if all of the teachers exhibit
high residuals, the standard unsupervised image reconstruction loss takes over
as the supervisory signal. On indoor scenes (VOID), we outperform blind
ensembling baselines by 13.3% and unsupervised methods by 20.3%; we boast a 79%
model size reduction while maintaining comparable performance to the best
supervised method. For outdoors (KITTI), we tie for 5th overall on the
benchmark despite not using ground truth.
Related papers
- RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Sparse Depth-Guided Attention for Accurate Depth Completion: A
Stereo-Assisted Monitored Distillation Approach [7.902840502973506]
We introduce a stereo-based model as a teacher model to improve the accuracy of the student model for depth completion.
To provide self-supervised information, we also employ multi-view depth consistency and multi-scale minimum reprojection.
arXiv Detail & Related papers (2023-03-28T09:23:19Z) - Distilling Calibrated Student from an Uncalibrated Teacher [8.101116303448586]
We study how to obtain a student from an uncalibrated teacher.
Our approach relies on the fusion of data-augmentation techniques, including but not limited to cutout, mixup, and CutMix.
We extend our approach beyond traditional knowledge distillation and find it suitable as well.
arXiv Detail & Related papers (2023-02-22T16:18:38Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Geometry Uncertainty Projection Network for Monocular 3D Object
Detection [138.24798140338095]
We propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth.
At the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification.
arXiv Detail & Related papers (2021-07-29T06:59:07Z) - Beyond Self-Supervision: A Simple Yet Effective Network Distillation
Alternative to Improve Backbones [40.33419553042038]
We propose to improve existing baseline networks via knowledge distillation from off-the-shelf pre-trained big powerful models.
Our solution performs distillation by only driving prediction of the student model consistent with that of the teacher model.
We empirically find that such simple distillation settings perform extremely effective, for example, the top-1 accuracy on ImageNet-1k validation set of MobileNetV3-large and ResNet50-D can be significantly improved.
arXiv Detail & Related papers (2021-03-10T09:32:44Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z) - Generative Model-Based Loss to the Rescue: A Method to Overcome
Annotation Errors for Depth-Based Hand Pose Estimation [76.12736932610163]
We propose to use a model-based generative loss for training hand pose estimators on depth images based on a volumetric hand model.
This additional loss allows training of a hand pose estimator that accurately infers the entire set of 21 hand keypoints while only using supervision for 6 easy-to-annotate keypoints (fingertips and wrist).
arXiv Detail & Related papers (2020-07-06T21:24:25Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.