Distortion-Aware Self-Supervised 360{\deg} Depth Estimation from A
Single Equirectangular Projection Image
- URL: http://arxiv.org/abs/2204.01027v1
- Date: Sun, 3 Apr 2022 08:28:44 GMT
- Title: Distortion-Aware Self-Supervised 360{\deg} Depth Estimation from A
Single Equirectangular Projection Image
- Authors: Yuya Hasegawa, Ikehata Satoshi, Kiyoharu Aizawa
- Abstract summary: This paper proposes a new technique for single 360deg image depth prediction under open environments.
One is the limitation of supervision datasets - the currently available dataset is limited to indoor scenes.
The other is the problems caused by Equirectangular Projection Format (ERP), commonly used for 360deg images, that are coordinate and distortion.
- Score: 35.943763515381214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 360{\deg} images are widely available over the last few years. This paper
proposes a new technique for single 360{\deg} image depth prediction under open
environments. Depth prediction from a 360{\deg} single image is not easy for
two reasons. One is the limitation of supervision datasets - the currently
available dataset is limited to indoor scenes. The other is the problems caused
by Equirectangular Projection Format (ERP), commonly used for 360{\deg} images,
that are coordinate and distortion. There is only one method existing that uses
cube map projection to produce six perspective images and apply self-supervised
learning using motion pictures for perspective depth prediction to deal with
these problems. Different from the existing method, we directly use the ERP
format. We propose a framework of direct use of ERP with coordinate conversion
of correspondences and distortion-aware upsampling module to deal with the ERP
related problems and extend a self-supervised learning method for open
environments. For the experiments, we firstly built a dataset for the
evaluation, and quantitatively evaluate the depth prediction in outdoor scenes.
We show that it outperforms the state-of-the-art technique
Related papers
- Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation [6.832852988957967]
We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively.
Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels.
We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy.
arXiv Detail & Related papers (2024-06-18T17:59:31Z) - Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors [51.36238367193988]
We tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM)
We present SparseSplat360, a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views.
Our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
arXiv Detail & Related papers (2024-05-26T11:01:39Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion [12.058261716065381]
We propose a 360 monocular depth estimation pipeline, textit OmniFusion, to tackle the spherical distortion issue.
Our pipeline transforms a 360 image into less-distorted perspective patches (i.e. tangent images) to obtain patch-wise predictions via CNN, and then merge the patch-wise results for final output.
Experiments show that our method greatly mitigates the distortion issue, and achieves state-of-the-art performances on several 360 monocular depth estimation benchmark datasets.
arXiv Detail & Related papers (2022-03-02T03:19:49Z) - Field-of-View IoU for Object Detection in 360{\deg} Images [36.72543749626039]
We propose two fundamental techniques -- Field-of-View IoU (FoV-IoU) and 360Augmentation for object detection in 360deg images.
FoV-IoU computes the intersection-over-union of two Field-of-View bounding boxes in a spherical image which could be used for training, inference, and evaluation.
360Augmentation is a data augmentation technique specific to 360deg object detection task which randomly rotates a spherical image and solves the bias due to the sphere-to-plane projection.
arXiv Detail & Related papers (2022-02-07T14:01:59Z) - 360{\deg} Optical Flow using Tangent Images [18.146747748702513]
equirectangular projection (ERP) is the most common format for storing, processing and visualising 360deg images.
We propose a 360deg optical flow method based on tangent images.
arXiv Detail & Related papers (2021-12-28T23:50:46Z) - 360-DFPE: Leveraging Monocular 360-Layouts for Direct Floor Plan
Estimation [43.56963653723287]
We present 360-DFPE, a sequential floor plan estimation method that directly takes 360-images as input without relying on active sensors or 3D information.
Our results show that our monocular solution achieves favorable performance against the current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-12T08:36:41Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z) - Visual Question Answering on 360{\deg} Images [96.00046925811515]
VQA 360 is a novel task of visual question answering on 360 images.
We collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types.
arXiv Detail & Related papers (2020-01-10T08:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.