Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation
- URL: http://arxiv.org/abs/2406.12849v2
- Date: Wed, 30 Oct 2024 16:37:01 GMT
- Title: Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation
- Authors: Ning-Hsu Wang, Yu-Lun Liu,
- Abstract summary: We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively.
Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels.
We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy.
- Score: 6.832852988957967
- License:
- Abstract: Accurately estimating depth in 360-degree imagery is crucial for virtual reality, autonomous navigation, and immersive media applications. Existing depth estimation methods designed for perspective-view imagery fail when applied to 360-degree images due to different camera projections and distortions, whereas 360-degree methods perform inferior due to the lack of labeled data pairs. We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively. Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels through a six-face cube projection technique, enabling efficient labeling of depth in 360-degree images. This method leverages the increasing availability of large datasets. Our approach includes two main stages: offline mask generation for invalid regions and an online semi-supervised joint training regime. We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy, particularly in zero-shot scenarios. Our proposed training pipeline can enhance any 360 monocular depth estimator and demonstrates effective knowledge transfer across different camera projections and data types. See our project page for results: https://albert100121.github.io/Depth-Anywhere/
Related papers
- The ADUULM-360 Dataset -- A Multi-Modal Dataset for Depth Estimation in Adverse Weather [12.155627785852284]
This work presents the ADUULM-360 dataset, a novel multi-modal dataset for depth estimation.
The ADUULM-360 dataset covers all established autonomous driving sensor modalities, cameras, lidars, and radars.
It is the first depth estimation dataset that contains diverse scenes in good and adverse weather conditions.
arXiv Detail & Related papers (2024-11-18T10:42:53Z) - SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets [65.64958606221069]
Multi-camera systems are often used in autonomous driving to achieve a 360$circ$ perception.
These 360$circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image.
We propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap.
arXiv Detail & Related papers (2024-02-19T02:41:37Z) - BiFuse++: Self-supervised and Efficient Bi-projection Fusion for 360
Depth Estimation [59.11106101006008]
We propose BiFuse++ to explore the combination of bi-projection fusion and the self-training scenario.
We propose a new fusion module and Contrast-Aware Photometric Loss to improve the performance of BiFuse.
arXiv Detail & Related papers (2022-09-07T06:24:21Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - 360 Depth Estimation in the Wild -- The Depth360 Dataset and the SegFuse
Network [35.03201732370496]
Single-view depth estimation from omnidirectional images has gained popularity with its wide range of applications such as autonomous driving and scene reconstruction.
In this work, we first establish a large-scale dataset with varied settings called Depth360 to tackle the training data problem.
We then propose an end-to-end two-branch multi-task learning network, SegFuse, that mimics the human eye to effectively learn from the dataset.
arXiv Detail & Related papers (2022-02-16T11:56:31Z) - Dense Depth Estimation from Multiple 360-degree Images Using Virtual
Depth [4.984601297028257]
The proposed pipeline leverages a spherical camera model that compensates for radial distortion in 360degree: images.
We propose an effective dense depth estimation method by setting virtual depth and minimizing photonic reprojection error.
The experimental results verify that the proposed pipeline improves estimation accuracy compared to the current state-of-art dense depth estimation methods.
arXiv Detail & Related papers (2021-12-30T05:27:28Z) - 360MonoDepth: High-Resolution 360{\deg} Monocular Depth Estimation [15.65828728205071]
monocular depth estimation remains a challenge for 360deg data.
Current CNN-based methods do not support such high resolutions due to limited GPU memory.
We propose a flexible framework for monocular depth estimation from high-resolution 360deg images using tangent images.
arXiv Detail & Related papers (2021-11-30T18:57:29Z) - Improving 360 Monocular Depth Estimation via Non-local Dense Prediction
Transformer and Joint Supervised and Self-supervised Learning [17.985386835096353]
We propose 360 monocular depth estimation methods which improve on the areas that limited previous studies.
First, we introduce a self-supervised 360 depth learning method that only utilizes gravity-aligned videos.
Second, we propose a joint learning scheme realized by combining supervised and self-supervised learning.
Third, we propose a non-local fusion block, which retains global information encoded by vision transformer when reconstructing the depths.
arXiv Detail & Related papers (2021-09-22T07:45:48Z) - LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth
Rendering [59.63979143021241]
We formulate the task of 360 layout estimation as a problem of predicting depth on the horizon line of a panorama.
We propose the Differentiable Depth Rendering procedure to make the conversion from layout to depth prediction differentiable.
Our method achieves state-of-the-art performance on numerous 360 layout benchmark datasets.
arXiv Detail & Related papers (2021-04-01T15:48:41Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.