RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View
Stereo
- URL: http://arxiv.org/abs/2204.01320v1
- Date: Mon, 4 Apr 2022 08:43:38 GMT
- Title: RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View
Stereo
- Authors: Junhua Xi, Yifei Shi, Yijie Wang, Yulan Guo, Kai Xu
- Abstract summary: RayMVSNet learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth.
Our method ranks top on both the DTU and the Tanks & Temples datasets over all previous learning-based methods.
- Score: 35.22032072756035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning-based multi-view stereo (MVS) has by far centered around 3D
convolution on cost volumes. Due to the high computation and memory consumption
of 3D CNN, the resolution of output depth is often considerably limited.
Different from most existing works dedicated to adaptive refinement of cost
volumes, we opt to directly optimize the depth value along each camera ray,
mimicking the range (depth) finding of a laser scanner. This reduces the MVS
problem to ray-based depth optimization which is much more light-weight than
full cost volume optimization. In particular, we propose RayMVSNet which learns
sequential prediction of a 1D implicit field along each camera ray with the
zero-crossing point indicating scene depth. This sequential modeling, conducted
based on transformer features, essentially learns the epipolar line search in
traditional multi-view stereo. We also devise a multi-task learning for better
optimization convergence and depth accuracy. Our method ranks top on both the
DTU and the Tanks \& Temples datasets over all previous learning-based methods,
achieving overall reconstruction score of 0.33mm on DTU and f-score of 59.48%
on Tanks & Temples.
Related papers
- NPLMV-PS: Neural Point-Light Multi-View Photometric Stereo [32.39157133181186]
We present a novel multi-view photometric stereo (MVPS) method.
Our work differs from the state-of-the-art multi-view PS-NeRF or Supernormal in that we explicitly leverage per-pixel intensity renderings.
Our method is among the first (along with Supernormal) to outperform the classical MVPS approach proposed by the DiLiGenT-MV benchmark.
arXiv Detail & Related papers (2024-05-20T14:26:07Z) - Stereo-Knowledge Distillation from dpMV to Dual Pixels for Light Field Video Reconstruction [12.519930982515802]
This work hypothesizes that distilling high-precision dark stereo knowledge, implicitly or explicitly, to efficient dual-pixel student networks enables faithful reconstructions.
We collect the first and largest 3-view dual-pixel video dataset, dpMV, to validate our explicit dark knowledge distillation hypothesis.
We show that these methods outperform purely monocular solutions, especially in challenging foreground-background separation regions using faithful guidance from dual pixels.
arXiv Detail & Related papers (2024-05-20T06:34:47Z) - SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model [72.0795843450604]
Current approaches face challenges in maintaining consistent accuracy across diverse scenes.
These methods rely on extensive datasets comprising millions, if not tens of millions, of data for training.
This paper presents SM$4$Depth, a model that seamlessly works for both indoor and outdoor scenes.
arXiv Detail & Related papers (2024-03-13T14:08:25Z) - ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive
depth range and depth interval [19.28042366225802]
Multi-View Stereo(MVS) is a fundamental problem in geometric computer vision.
We present a novel multi-stage coarse-to-fine framework to achieve adaptive all-pixel depth range and depth interval.
Our model achieves state-of-the-art performance and yields competitive generalization ability.
arXiv Detail & Related papers (2023-08-17T14:52:11Z) - RayMVSNet++: Learning Ray-based 1D Implicit Fields for Accurate
Multi-View Stereo [21.209964556493368]
RayMVSNet learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth.
RayMVSNet++ achieves state-of-the-art performance on the ScanNet dataset.
arXiv Detail & Related papers (2023-07-16T02:10:47Z) - Multiview Stereo with Cascaded Epipolar RAFT [73.7619703879639]
We address multiview stereo (MVS), an important 3D vision task that reconstructs a 3D model such as a dense point cloud from multiple calibrated images.
We propose CER-MVS, a new approach based on the RAFT (Recurrent All-Pairs Field Transforms) architecture developed for optical flow. CER-MVS introduces five new changes to RAFT: epipolar cost volumes, cost volume cascading, multiview fusion of cost volumes, dynamic supervision, and multiresolution fusion of depth maps.
arXiv Detail & Related papers (2022-05-09T18:17:05Z) - DSGN++: Exploiting Visual-Spatial Relation forStereo-based 3D Detectors [60.88824519770208]
Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors.
We revisit the prior stereo modeling DSGN about the stereo volume constructions for representing both 3D geometry and semantics.
We propose our approach, DSGN++, aiming for improving information flow throughout the 2D-to-3D pipeline.
arXiv Detail & Related papers (2022-04-06T18:43:54Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.