SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization
- URL: http://arxiv.org/abs/2401.06385v1
- Date: Fri, 12 Jan 2024 05:25:57 GMT
- Title: SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization
- Authors: Zhenlong Yuan, Jiakai Cao, Zhaoxin Li, Hao Jiang, Zhaoqi Wang
- Abstract summary: We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
- Score: 6.886220026399106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo
(SD-MVS), a method that can effectively tackle challenges in 3D reconstruction
of textureless areas. We are the first to adopt the Segment Anything Model
(SAM) to distinguish semantic instances in scenes and further leverage these
constraints for pixelwise patch deformation on both matching cost and
propagation. Concurrently, we propose a unique refinement strategy that
combines spherical coordinates and gradient descent on normals and pixelwise
search interval on depths, significantly improving the completeness of
reconstructed 3D model. Furthermore, we adopt the Expectation-Maximization (EM)
algorithm to alternately optimize the aggregate matching cost and
hyperparameters, effectively mitigating the problem of parameters being
excessively dependent on empirical tuning. Evaluations on the ETH3D
high-resolution multi-view stereo benchmark and the Tanks and Temples dataset
demonstrate that our method can achieve state-of-the-art results with less time
consumption.
Related papers
- VortSDF: 3D Modeling with Centroidal Voronoi Tesselation on Signed Distance Field [5.573454319150408]
We introduce a volumetric optimization framework that combines explicit SDF fields with a shallow color network, in order to estimate 3D shape properties over tetrahedral grids.
Experimental results with Chamfer statistics validate this approach with unprecedented reconstruction quality on various scenarios such as objects, open scenes or human.
arXiv Detail & Related papers (2024-07-29T09:46:39Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction [3.1820300989695833]
This paper introduces a versatile paradigm for integrating multi-view reflectance and normal maps acquired through photometric stereo.
Our approach employs a pixel-wise joint re- parameterization of reflectance and normal, considering them as a vector of radiances rendered under simulated, varying illumination.
It significantly improves the detailed 3D reconstruction of areas with high curvature or low visibility.
arXiv Detail & Related papers (2023-12-02T19:49:27Z) - TSAR-MVS: Textureless-aware Segmentation and Correlative Refinement Guided Multi-View Stereo [3.6728185343140685]
We propose a Textureless-aware And Correlative Refinement guided Multi-View Stereo (TSAR-MVS) method.
It effectively tackles challenges posed by textureless areas in 3D reconstruction through filtering, refinement and segmentation.
Experiments on ETH3D, Tanks & Temples and Strecha datasets demonstrate the superior performance and strong capability of our proposed method.
arXiv Detail & Related papers (2023-08-19T11:40:57Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z) - Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations.
We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z) - Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online
Adaptation [87.85851771425325]
We consider a new problem of adapting a human mesh reconstruction model to out-of-domain streaming videos.
We tackle this problem through online adaptation, gradually correcting the model bias during testing.
We propose the Dynamic Bilevel Online Adaptation algorithm (DynaBOA)
arXiv Detail & Related papers (2021-11-07T07:23:24Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network [8.127449025802436]
We present a novel recurrent multi-view stereo network based on long short-term memory (LSTM) with adaptive aggregation, namely AA-RMVSNet.
We firstly introduce an intra-view aggregation module to adaptively extract image features by using context-aware convolution and multi-scale aggregation.
We propose an inter-view cost volume aggregation module for adaptive pixel-wise view aggregation, which is able to preserve better-matched pairs among all views.
arXiv Detail & Related papers (2021-08-09T06:10:48Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - Real-time Dense Reconstruction of Tissue Surface from Stereo Optical
Video [10.181846237133167]
We propose an approach to reconstruct dense three-dimensional (3D) model of tissue surface from stereo optical videos in real-time.
The basic idea is to first extract 3D information from video frames by using stereo matching, and then to mosaic the reconstructed 3D models.
Experimental results on ex- and in vivo data showed that the reconstructed 3D models have high resolution texture with an accuracy error of less than 2 mm.
arXiv Detail & Related papers (2020-07-16T19:14:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.