Improving 360 Monocular Depth Estimation via Non-local Dense Prediction
Transformer and Joint Supervised and Self-supervised Learning
- URL: http://arxiv.org/abs/2109.10563v2
- Date: Thu, 23 Sep 2021 06:27:53 GMT
- Title: Improving 360 Monocular Depth Estimation via Non-local Dense Prediction
Transformer and Joint Supervised and Self-supervised Learning
- Authors: Ilwi Yun, Hyuk-Jae Lee, Chae Eun Rhee
- Abstract summary: We propose 360 monocular depth estimation methods which improve on the areas that limited previous studies.
First, we introduce a self-supervised 360 depth learning method that only utilizes gravity-aligned videos.
Second, we propose a joint learning scheme realized by combining supervised and self-supervised learning.
Third, we propose a non-local fusion block, which retains global information encoded by vision transformer when reconstructing the depths.
- Score: 17.985386835096353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to difficulties in acquiring ground truth depth of equirectangular (360)
images, the quality and quantity of equirectangular depth data today is
insufficient to represent the various scenes in the world. Therefore, 360 depth
estimation studies, which relied solely on supervised learning, are destined to
produce unsatisfactory results. Although self-supervised learning methods
focusing on equirectangular images (EIs) are introduced, they often have
incorrect or non-unique solutions, causing unstable performance. In this paper,
we propose 360 monocular depth estimation methods which improve on the areas
that limited previous studies. First, we introduce a self-supervised 360 depth
learning method that only utilizes gravity-aligned videos, which has the
potential to eliminate the needs for depth data during the training procedure.
Second, we propose a joint learning scheme realized by combining supervised and
self-supervised learning. The weakness of each learning is compensated, thus
leading to more accurate depth estimation. Third, we propose a non-local fusion
block, which retains global information encoded by vision transformer when
reconstructing the depths. With the proposed methods, we successfully apply the
transformer to 360 depth estimations, to the best of our knowledge, which has
not been tried before. On several benchmarks, our approach achieves significant
improvements over previous works and establishes a state of the art.
Related papers
- Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation [6.832852988957967]
We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively.
Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels.
We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy.
arXiv Detail & Related papers (2024-06-18T17:59:31Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - Sparse Depth-Guided Attention for Accurate Depth Completion: A
Stereo-Assisted Monitored Distillation Approach [7.902840502973506]
We introduce a stereo-based model as a teacher model to improve the accuracy of the student model for depth completion.
To provide self-supervised information, we also employ multi-view depth consistency and multi-scale minimum reprojection.
arXiv Detail & Related papers (2023-03-28T09:23:19Z) - 360 Depth Estimation in the Wild -- The Depth360 Dataset and the SegFuse
Network [35.03201732370496]
Single-view depth estimation from omnidirectional images has gained popularity with its wide range of applications such as autonomous driving and scene reconstruction.
In this work, we first establish a large-scale dataset with varied settings called Depth360 to tackle the training data problem.
We then propose an end-to-end two-branch multi-task learning network, SegFuse, that mimics the human eye to effectively learn from the dataset.
arXiv Detail & Related papers (2022-02-16T11:56:31Z) - Depth Refinement for Improved Stereo Reconstruction [13.941756438712382]
Current techniques for depth estimation from stereoscopic images still suffer from a built-in drawback.
A simple analysis reveals that the depth error is quadratically proportional to the object's distance.
We propose a simple but effective method that uses a refinement network for depth estimation.
arXiv Detail & Related papers (2021-12-15T12:21:08Z) - On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation [60.780823530087446]
We show that improvements in image synthesis do not necessitate improvement in depth estimation.
We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data.
This observed divergence has not been previously reported or studied in depth.
arXiv Detail & Related papers (2021-09-13T17:57:24Z) - Unsupervised Monocular Depth Perception: Focusing on Moving Objects [5.489557739480878]
In this paper, we show that deliberately manipulating photometric errors can efficiently deal with difficulties better.
We first propose an outlier masking technique that considers the occluded or dynamic pixels as statistical outliers in the photometric error map.
With the outlier masking, the network learns the depth of objects that move in the opposite direction to the camera more accurately.
arXiv Detail & Related papers (2021-08-30T08:45:02Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Geometry Uncertainty Projection Network for Monocular 3D Object
Detection [138.24798140338095]
We propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth.
At the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification.
arXiv Detail & Related papers (2021-07-29T06:59:07Z) - Occlusion-aware Unsupervised Learning of Depth from 4-D Light Fields [50.435129905215284]
We present an unsupervised learning-based depth estimation method for 4-D light field processing and analysis.
Based on the basic knowledge of the unique geometry structure of light field data, we explore the angular coherence among subsets of the light field views to estimate depth maps.
Our method can significantly shrink the performance gap between the previous unsupervised method and supervised ones, and produce depth maps with comparable accuracy to traditional methods with obviously reduced computational cost.
arXiv Detail & Related papers (2021-06-06T06:19:50Z) - Self-Supervised Human Depth Estimation from Monocular Videos [99.39414134919117]
Previous methods on estimating detailed human depth often require supervised training with ground truth' depth data.
This paper presents a self-supervised method that can be trained on YouTube videos without known depth.
Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.
arXiv Detail & Related papers (2020-05-07T09:45:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.