Curvature-guided dynamic scale networks for Multi-view Stereo
- URL: http://arxiv.org/abs/2112.05999v1
- Date: Sat, 11 Dec 2021 14:41:05 GMT
- Title: Curvature-guided dynamic scale networks for Multi-view Stereo
- Authors: Khang Truong Giang, Soohwan Song, and Sungho Jo
- Abstract summary: This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation.
We present a dynamic scale feature extraction network, namely, CDSFNet.
It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface.
- Score: 10.667165962654996
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-view stereo (MVS) is a crucial task for precise 3D reconstruction. Most
recent studies tried to improve the performance of matching cost volume in MVS
by designing aggregated 3D cost volumes and their regularization. This paper
focuses on learning a robust feature extraction network to enhance the
performance of matching costs without heavy computation in the other steps. In
particular, we present a dynamic scale feature extraction network, namely,
CDSFNet. It is composed of multiple novel convolution layers, each of which can
select a proper patch scale for each pixel guided by the normal curvature of
the image surface. As a result, CDFSNet can estimate the optimal patch scales
to learn discriminative features for accurate matching computation between
reference and source images. By combining the robust extracted features with an
appropriate cost formulation strategy, our resulting MVS architecture can
estimate depth maps more precisely. Extensive experiments showed that the
proposed method outperforms other state-of-the-art methods on complex outdoor
scenes. It significantly improves the completeness of reconstructed models. As
a result, the method can process higher resolution inputs within faster
run-time and lower memory than other MVS methods. Our source code is available
at url{https://github.com/TruongKhang/cds-mvsnet}.
Related papers
- SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Multi-View Photometric Stereo Revisited [100.97116470055273]
Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images.
We present a simple, practical approach to MVPS, which works well for isotropic as well as other object material types such as anisotropic and glossy.
The proposed approach shows state-of-the-art results when tested extensively on several benchmark datasets.
arXiv Detail & Related papers (2022-10-14T09:46:15Z) - Detailed Facial Geometry Recovery from Multi-view Images by Learning an
Implicit Function [12.522283941978722]
We propose a novel architecture to recover extremely detailed 3D faces in roughly 10 seconds.
By fitting a 3D morphable model from multi-view images, the features of multiple images are extracted and aggregated in the mesh-attached UV space.
Our method outperforms SOTA learning-based MVS in accuracy by a large margin on the FaceScape dataset.
arXiv Detail & Related papers (2022-01-04T07:24:58Z) - Multi-View Stereo Network with attention thin volume [0.0]
We propose an efficient multi-view stereo (MVS) network for infering depth value from multiple RGB images.
We introduce the self-attention mechanism to fully aggregate the dominant information from input images.
We also introduce the group-wise correlation to feature aggregation, which greatly reduces the memory and calculation burden.
arXiv Detail & Related papers (2021-10-16T11:51:23Z) - Correlate-and-Excite: Real-Time Stereo Matching via Guided Cost Volume
Excitation [65.83008812026635]
We construct Guided Cost volume Excitation (GCE) and show that simple channel excitation of cost volume guided by image can improve performance considerably.
We present an end-to-end network that we call Correlate-and-Excite (CoEx)
arXiv Detail & Related papers (2021-08-12T14:32:26Z) - AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network [8.127449025802436]
We present a novel recurrent multi-view stereo network based on long short-term memory (LSTM) with adaptive aggregation, namely AA-RMVSNet.
We firstly introduce an intra-view aggregation module to adaptively extract image features by using context-aware convolution and multi-scale aggregation.
We propose an inter-view cost volume aggregation module for adaptive pixel-wise view aggregation, which is able to preserve better-matched pairs among all views.
arXiv Detail & Related papers (2021-08-09T06:10:48Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.