Auto-Rectify Network for Unsupervised Indoor Depth Estimation
- URL: http://arxiv.org/abs/2006.02708v2
- Date: Tue, 14 Dec 2021 06:17:08 GMT
- Title: Auto-Rectify Network for Unsupervised Indoor Depth Estimation
- Authors: Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Tat-Jun Chin, Chunhua
Shen, Ian Reid
- Abstract summary: We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
- Score: 119.82412041164372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-View depth estimation using the CNNs trained from unlabelled videos
has shown significant promise. However, excellent results have mostly been
obtained in street-scene driving scenarios, and such methods often fail in
other settings, particularly indoor videos taken by handheld devices. In this
work, we establish that the complex ego-motions exhibited in handheld settings
are a critical obstacle for learning depth. Our fundamental analysis suggests
that the rotation behaves as noise during training, as opposed to the
translation (baseline) which provides supervision signals. To address the
challenge, we propose a data pre-processing method that rectifies training
images by removing their relative rotations for effective learning. The
significantly improved performance validates our motivation. Towards end-to-end
learning without requiring pre-processing, we propose an Auto-Rectify Network
with novel loss functions, which can automatically learn to rectify images
during training. Consequently, our results outperform the previous unsupervised
SOTA method by a large margin on the challenging NYUv2 dataset. We also
demonstrate the generalization of our trained model in ScanNet and Make3D, and
the universality of our proposed learning method on 7-Scenes and KITTI
datasets.
Related papers
- CovarNav: Machine Unlearning via Model Inversion and Covariance
Navigation [11.222501077070765]
Machine unlearning has emerged as an essential technique to selectively remove the influence of specific training data points on trained models.
We introduce a three-step process, named CovarNav, to facilitate this forgetting.
We rigorously evaluate CovarNav on the CIFAR-10 and Vggface2 datasets.
arXiv Detail & Related papers (2023-11-21T21:19:59Z) - A Study of Forward-Forward Algorithm for Self-Supervised Learning [65.268245109828]
We study the performance of forward-forward vs. backpropagation for self-supervised representation learning.
Our main finding is that while the forward-forward algorithm performs comparably to backpropagation during (self-supervised) training, the transfer performance is significantly lagging behind in all the studied settings.
arXiv Detail & Related papers (2023-09-21T10:14:53Z) - Towards Better Data Exploitation in Self-Supervised Monocular Depth
Estimation [14.262669370264994]
In this paper, we take two data augmentation techniques, namely Resizing-Cropping and Splitting-Permuting, to fully exploit the potential of training datasets.
Specifically, the original image and the generated two augmented images are fed into the training pipeline simultaneously and we leverage them to conduct self-distillation.
Experimental results demonstrate our method can achieve state-of-the-art performance on the KITTI benchmark, with both raw ground truth and improved ground truth.
arXiv Detail & Related papers (2023-09-11T06:18:05Z) - PRSNet: A Masked Self-Supervised Learning Pedestrian Re-Identification
Method [2.0411082897313984]
This paper designs a pre-task of mask reconstruction to obtain a pre-training model with strong robustness.
The training optimization of the network is performed by improving the triplet loss based on the centroid.
This method achieves about 5% higher mAP on Marker1501 and CUHK03 data than existing self-supervised learning pedestrian re-identification methods.
arXiv Detail & Related papers (2023-03-11T07:20:32Z) - PIVOT: Prompting for Video Continual Learning [50.80141083993668]
We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain.
Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
arXiv Detail & Related papers (2022-12-09T13:22:27Z) - An Empirical Study of Remote Sensing Pretraining [117.90699699469639]
We conduct an empirical study of remote sensing pretraining (RSP) on aerial images.
RSP can help deliver distinctive performances in scene recognition tasks.
RSP mitigates the data discrepancies of traditional ImageNet pretraining on RS images, but it may still suffer from task discrepancies.
arXiv Detail & Related papers (2022-04-06T13:38:11Z) - Recursive Least-Squares Estimator-Aided Online Learning for Visual
Tracking [58.14267480293575]
We propose a simple yet effective online learning approach for few-shot online adaptation without requiring offline training.
It allows an in-built memory retention mechanism for the model to remember the knowledge about the object seen before.
We evaluate our approach based on two networks in the online learning families for tracking, i.e., multi-layer perceptrons in RT-MDNet and convolutional neural networks in DiMP.
arXiv Detail & Related papers (2021-12-28T06:51:18Z) - On the Impact of Interpretability Methods in Active Image Augmentation
Method [2.740398518066079]
We propose an experimental analysis of interpretability method's impact on ADA.
We use five interpretability methods: Vanilla Backpropagation, Guided Backpropagation, GradCam, Guided GradCam, and InputXGradient.
The results show that all methods achieve similar performance at the ending of training, but when combining ADA with GradCam, the U-Net model presented an impressive fast convergence.
arXiv Detail & Related papers (2021-02-24T15:40:54Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z) - Self-supervised Fine-tuning for Correcting Super-Resolution
Convolutional Neural Networks [17.922507191213494]
We show that one can avoid training and correct for SR results with a fully self-supervised fine-tuning approach.
We apply our fine-tuning algorithm on multiple image and video SR CNNs and show that it can successfully correct for a sub-optimal SR solution.
arXiv Detail & Related papers (2019-12-30T11:02:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.