Related papers: Unsupervised OmniMVS: Efficient Omnidirectional Depth Inference via Establishing Pseudo-Stereo Supervision

Unsupervised OmniMVS: Efficient Omnidirectional Depth Inference via Establishing Pseudo-Stereo Supervision

URL: http://arxiv.org/abs/2302.09922v2
Date: Wed, 22 Feb 2023 08:51:08 GMT
Title: Unsupervised OmniMVS: Efficient Omnidirectional Depth Inference via Establishing Pseudo-Stereo Supervision
Authors: Zisong Chen, Chunyu Lin, Lang Nie, Kang Liao, Yao Zhao
Abstract summary: We propose the first unsupervised omnidirectional MVS framework based on multiple fisheye images. The two 360deg images formulate a stereo pair with a special pose, and the photometric consistency is leveraged to establish the unsupervised constraint. Experiments exhibit that the performance of our unsupervised solution is competitive to that of the state-of-the-art supervised methods.
Score: 40.58193195996798
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Omnidirectional multi-view stereo (MVS) vision is attractive for its ultra-wide field-of-view (FoV), enabling machines to perceive 360{\deg} 3D surroundings. However, the existing solutions require expensive dense depth labels for supervision, making them impractical in real-world applications. In this paper, we propose the first unsupervised omnidirectional MVS framework based on multiple fisheye images. To this end, we project all images to a virtual view center and composite two panoramic images with spherical geometry from two pairs of back-to-back fisheye images. The two 360{\deg} images formulate a stereo pair with a special pose, and the photometric consistency is leveraged to establish the unsupervised constraint, which we term "Pseudo-Stereo Supervision". In addition, we propose Un-OmniMVS, an efficient unsupervised omnidirectional MVS network, to facilitate the inference speed with two efficient components. First, a novel feature extractor with frequency attention is proposed to simultaneously capture the non-local Fourier features and local spatial features, explicitly facilitating the feature representation. Then, a variance-based light cost volume is put forward to reduce the computational complexity. Experiments exhibit that the performance of our unsupervised solution is competitive to that of the state-of-the-art (SoTA) supervised methods with better generalization in real-world data.

Related papers

FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes [50.534213038479926]
FreeSplat is capable of reconstructing geometrically consistent 3D scenes from long sequence input towards free-view synthesis. We propose a simple but effective free-view training strategy that ensures robust view synthesis across broader view range regardless of the number of views.
arXiv Detail & Related papers (2024-05-28T08:40:14Z)
Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations [61.448005005426666]
We consider two challenging issues in reference-based super-resolution (RefSR) for smartphone. We propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms.
arXiv Detail & Related papers (2024-05-03T15:20:30Z)
MSI-NeRF: Linking Omni-Depth with View Synthesis through Multi-Sphere Image aided Generalizable Neural Radiance Field [1.3162012586770577]
We introduce MSI-NeRF, which combines deep learning omnidirectional depth estimation and novel view synthesis. We construct a multi-sphere image as a cost volume through feature extraction and warping of the input images. Our network has the generalization ability to reconstruct unknown scenes efficiently using only four images.
arXiv Detail & Related papers (2024-03-16T07:26:50Z)
Multi-Plane Neural Radiance Fields for Novel View Synthesis [5.478764356647437]
Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints. In this work, we examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields. We propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range.
arXiv Detail & Related papers (2023-03-03T06:32:55Z)
Multi-Projection Fusion and Refinement Network for Salient Object Detection in 360{\deg} Omnidirectional Image [141.10227079090419]
We propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360deg omnidirectional image. MPFR-Net uses the equirectangular projection image and four corresponding cube-unfolding images as inputs. Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-12-23T14:50:40Z)
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation. To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering. Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
PanoDepth: A Two-Stage Approach for Monocular Omnidirectional Depth Estimation [11.66493799838823]
We propose a novel, model-agnostic, two-stage pipeline for omnidirectional monocular depth estimation. Our framework PanoDepth takes one 360 image as input, produces one or more synthesized views in the first stage, and feeds the original image and the synthesized images into the subsequent stereo matching stage. Our results show that PanoDepth outperforms the state-of-the-art approaches by a large margin for 360 monocular depth estimation.
arXiv Detail & Related papers (2022-02-02T23:08:06Z)
Moving in a 360 World: Synthesizing Panoramic Parallaxes from a Single Panorama [13.60790015417166]
We present Omnidirectional Neural Radiance Fields ( OmniNeRF), the first method to the application of parallax-enabled novel panoramic view synthesis. We propose to augment the single RGB-D panorama by projecting back and forth between a 3D world and different 2D panoramic coordinates at different virtual camera positions. As a result, the proposed OmniNeRF achieves convincing renderings of novel panoramic views that exhibit the parallax effect.
arXiv Detail & Related papers (2021-06-21T05:08:34Z)
OmniSLAM: Omnidirectional Localization and Dense Mapping for Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras. For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation. We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.