C2F2NeUS: Cascade Cost Frustum Fusion for High Fidelity and
Generalizable Neural Surface Reconstruction
- URL: http://arxiv.org/abs/2306.10003v2
- Date: Mon, 14 Aug 2023 15:09:45 GMT
- Title: C2F2NeUS: Cascade Cost Frustum Fusion for High Fidelity and
Generalizable Neural Surface Reconstruction
- Authors: Luoyuan Xu, Tao Guan, Yuesong Wang, Wenkai Liu, Zhaojie Zeng, Junle
Wang, Wei Yang
- Abstract summary: We introduce a novel integration scheme that combines the multi-view stereo with neural signed distance function representations.
Our method reconstructs robust surfaces and outperforms existing state-of-the-art methods.
- Score: 12.621233209149953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is an emerging effort to combine the two popular 3D frameworks using
Multi-View Stereo (MVS) and Neural Implicit Surfaces (NIS) with a specific
focus on the few-shot / sparse view setting. In this paper, we introduce a
novel integration scheme that combines the multi-view stereo with neural signed
distance function representations, which potentially overcomes the limitations
of both methods. MVS uses per-view depth estimation and cross-view fusion to
generate accurate surfaces, while NIS relies on a common coordinate volume.
Based on this strategy, we propose to construct per-view cost frustum for finer
geometry estimation, and then fuse cross-view frustums and estimate the
implicit signed distance functions to tackle artifacts that are due to noise
and holes in the produced surface reconstruction. We further apply a cascade
frustum fusion strategy to effectively captures global-local information and
structural consistency. Finally, we apply cascade sampling and a
pseudo-geometric loss to foster stronger integration between the two
architectures. Extensive experiments demonstrate that our method reconstructs
robust surfaces and outperforms existing state-of-the-art methods.
Related papers
- Mesh Denoising Transformer [104.5404564075393]
Mesh denoising is aimed at removing noise from input meshes while preserving their feature structures.
SurfaceFormer is a pioneering Transformer-based mesh denoising framework.
New representation known as Local Surface Descriptor captures local geometric intricacies.
Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation.
arXiv Detail & Related papers (2024-05-10T15:27:43Z) - Sur2f: A Hybrid Representation for High-Quality and Efficient Surface
Reconstruction from Multi-view Images [41.81291587750352]
Multi-view surface reconstruction is an ill-posed, inverse problem in 3D vision research.
Most of the existing methods rely either on explicit meshes, or on implicit field functions, using volume rendering of the fields for reconstruction.
We propose a new hybrid representation, termed Sur2f, aiming to better benefit from both representations in a complementary manner.
arXiv Detail & Related papers (2024-01-08T07:22:59Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Multi-View Stereo Representation Revisit: Region-Aware MVSNet [8.264851594332677]
Deep learning-based multi-view stereo has emerged as a powerful paradigm for reconstructing the complete geometrically-detailed objects from multi-views.
We propose RA-MVSNet to take advantage of point-to-surface distance so that the model is able to perceive a wider range of surfaces.
Our proposed RA-MVSNet is patch-awared, since the perception range is enhanced by associating hypothetical planes with a patch of surface.
arXiv Detail & Related papers (2023-04-26T15:17:51Z) - DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view
Structure from Motion [9.294501649791016]
Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM)
We formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE.
Our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.
arXiv Detail & Related papers (2022-10-11T15:07:25Z) - BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion [85.24673400250671]
We present Bi-level Neural Volume Fusion (BNV-Fusion), which leverages recent advances in neural implicit representations and neural rendering for dense 3D reconstruction.
In order to incrementally integrate new depth maps into a global neural implicit representation, we propose a novel bi-level fusion strategy.
We evaluate the proposed method on multiple datasets quantitatively and qualitatively, demonstrating a significant improvement over existing methods.
arXiv Detail & Related papers (2022-04-03T19:33:09Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.