Rethinking Disparity: A Depth Range Free Multi-View Stereo Based on
Disparity
- URL: http://arxiv.org/abs/2211.16905v1
- Date: Wed, 30 Nov 2022 11:05:02 GMT
- Title: Rethinking Disparity: A Depth Range Free Multi-View Stereo Based on
Disparity
- Authors: Qingsong Yan, Qiang Wang, Kaiyong Zhao, Bo Li, Xiaowen Chu, Fei Deng
- Abstract summary: Existing learning-based multi-view stereo (MVS) methods rely on the depth range to build the 3D cost volume.
We propose a disparity-based MVS method based on the epipolar disparity flow (E-flow), called DispMVS.
We show that DispMVS is not sensitive to the depth range and achieves state-of-the-art results with lower GPU memory.
- Score: 17.98608948955211
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Existing learning-based multi-view stereo (MVS) methods rely on the depth
range to build the 3D cost volume and may fail when the range is too large or
unreliable. To address this problem, we propose a disparity-based MVS method
based on the epipolar disparity flow (E-flow), called DispMVS, which infers the
depth information from the pixel movement between two views. The core of
DispMVS is to construct a 2D cost volume on the image plane along the epipolar
line between each pair (between the reference image and several source images)
for pixel matching and fuse uncountable depths triangulated from each pair by
multi-view geometry to ensure multi-view consistency. To be robust, DispMVS
starts from a randomly initialized depth map and iteratively refines the depth
map with the help of the coarse-to-fine strategy. Experiments on DTUMVS and
Tanks\&Temple datasets show that DispMVS is not sensitive to the depth range
and achieves state-of-the-art results with lower GPU memory.
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing [12.506628755166814]
We re-examine the deformable learning method in the Multi-View Stereo task and propose a novel paradigm based on view Space and Depth deformable Learning (SDL-MVS)
Our SDL-MVS aims to learn deformable interactions of features in different view spaces and deformably model the depth ranges and intervals to enable high accurate depth estimation.
Experiments on LuoJia-MVS and WHU datasets show that our SDL-MVS reaches state-of-the-art performance.
arXiv Detail & Related papers (2024-05-27T12:59:46Z) - DELS-MVS: Deep Epipolar Line Search for Multi-View Stereo [0.0]
We propose a novel approach for deep learning-based Multi-View Stereo (MVS)
For each pixel in the reference image, our method leverages a deep architecture to search for the corresponding point in the source image directly along the corresponding epipolar line.
We test DELS-MVS on the ETH3D, Tanks and Temples and DTU benchmarks and achieve competitive results with respect to state-of-the-art approaches.
arXiv Detail & Related papers (2022-12-13T15:00:12Z) - IB-MVS: An Iterative Algorithm for Deep Multi-View Stereo based on
Binary Decisions [0.0]
We present a novel deep-learning-based method for Multi-View Stereo.
Our method estimates high resolution and highly precise depth maps iteratively, by traversing the continuous space of feasible depth values at each pixel in a binary decision fashion.
We compare our method with state-of-the-art Multi-View Stereo methods on the DTU, Tanks and Temples and the challenging ETH3D benchmarks and show competitive results.
arXiv Detail & Related papers (2021-11-29T10:04:24Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images.
We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.