PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for
Weakly-Textured Surface Reconstruction
- URL: http://arxiv.org/abs/2203.02156v1
- Date: Fri, 4 Mar 2022 07:05:23 GMT
- Title: PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for
Weakly-Textured Surface Reconstruction
- Authors: Haonan Dong, Jian Yao
- Abstract summary: This paper proposes robust loss functions leveraging constraints beneath multi-view images to alleviate matching ambiguity.
Our strategy can be implemented with arbitrary depth estimation frameworks and can be trained with arbitrary large-scale MVS datasets.
Our method reaches the performance of the state-of-the-art methods on popular benchmarks, like DTU, Tanks and Temples and ETH3D.
- Score: 2.9896482273918434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning-based multi-view stereo (MVS) has gained fine reconstructions on
popular datasets. However, supervised learning methods require ground truth for
training, which is hard to be collected, especially for the large-scale
datasets. Though nowadays unsupervised learning methods have been proposed and
have gotten gratifying results, those methods still fail to reconstruct intact
results in challenging scenes, such as weakly-textured surfaces, as those
methods primarily depend on pixel-wise photometric consistency which is
subjected to various illuminations. To alleviate matching ambiguity in those
challenging scenes, this paper proposes robust loss functions leveraging
constraints beneath multi-view images: 1) Patch-wise photometric consistency
loss, which expands the receptive field of the features in multi-view
similarity measuring, 2) Robust twoview geometric consistency, which includes a
cross-view depth consistency checking with the minimum occlusion. Our
unsupervised strategy can be implemented with arbitrary depth estimation
frameworks and can be trained with arbitrary large-scale MVS datasets.
Experiments show that our method can decrease the matching ambiguity and
particularly improve the completeness of weakly-textured reconstruction.
Moreover, our method reaches the performance of the state-of-the-art methods on
popular benchmarks, like DTU, Tanks and Temples and ETH3D. The code will be
released soon.
Related papers
- SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - TSAR-MVS: Textureless-aware Segmentation and Correlative Refinement Guided Multi-View Stereo [3.6728185343140685]
We propose a Textureless-aware And Correlative Refinement guided Multi-View Stereo (TSAR-MVS) method.
It effectively tackles challenges posed by textureless areas in 3D reconstruction through filtering, refinement and segmentation.
Experiments on ETH3D, Tanks & Temples and Strecha datasets demonstrate the superior performance and strong capability of our proposed method.
arXiv Detail & Related papers (2023-08-19T11:40:57Z) - Learning from Multi-Perception Features for Real-Word Image
Super-resolution [87.71135803794519]
We propose a novel SR method called MPF-Net that leverages multiple perceptual features of input images.
Our method incorporates a Multi-Perception Feature Extraction (MPFE) module to extract diverse perceptual information.
We also introduce a contrastive regularization term (CR) that improves the model's learning capability.
arXiv Detail & Related papers (2023-05-26T07:35:49Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - Towards Unpaired Depth Enhancement and Super-Resolution in the Wild [121.96527719530305]
State-of-the-art data-driven methods of depth map super-resolution rely on registered pairs of low- and high-resolution depth maps of the same scenes.
We consider an approach to depth map enhancement based on learning from unpaired data.
arXiv Detail & Related papers (2021-05-25T16:19:16Z) - 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos [107.36352212367179]
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
The proposed method is able to learn 3D body pose and shape across different resolutions with one single model.
We extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input.
arXiv Detail & Related papers (2021-03-11T06:52:12Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z) - Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware
Multi-view Geometry Consistency [40.56510679634943]
We propose a self-supervised training architecture by leveraging the multi-view geometry consistency.
We design three novel loss functions for multi-view consistency, including the pixel consistency loss, the depth consistency loss, and the facial landmark-based epipolar loss.
Our method is accurate and robust, especially under large variations of expressions, poses, and illumination conditions.
arXiv Detail & Related papers (2020-07-24T12:36:09Z) - Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples.
We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.