High-Fidelity and Generalizable Neural Surface Reconstruction with Sparse Feature Volumes
- URL: http://arxiv.org/abs/2507.05952v1
- Date: Tue, 08 Jul 2025 12:50:39 GMT
- Title: High-Fidelity and Generalizable Neural Surface Reconstruction with Sparse Feature Volumes
- Authors: Aoxiang Fan, Corentin Dumery, Nicolas Talabot, Hieu Le, Pascal Fua,
- Abstract summary: Generalizable neural surface reconstruction has become a compelling technique to reconstruct from few images without per-scene optimization.<n>We present a sparse representation method, that maximizes memory efficiency and enables significantly higher resolution reconstructions on standard hardware.
- Score: 50.83282258807327
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalizable neural surface reconstruction has become a compelling technique to reconstruct from few images without per-scene optimization, where dense 3D feature volume has proven effective as a global representation of scenes. However, the dense representation does not scale well to increasing voxel resolutions, severely limiting the reconstruction quality. We thus present a sparse representation method, that maximizes memory efficiency and enables significantly higher resolution reconstructions on standard hardware. We implement this through a two-stage approach: First training a network to predict voxel occupancies from posed images and associated depth maps, then computing features and performing volume rendering only in voxels with sufficiently high occupancy estimates. To support this sparse representation, we developed custom algorithms for efficient sampling, feature aggregation, and querying from sparse volumes-overcoming the dense-volume assumptions inherent in existing works. Experiments on public datasets demonstrate that our approach reduces storage requirements by more than 50 times without performance degradation, enabling reconstructions at $512^3$ resolution compared to the typical $128^3$ on similar hardware, and achieving superior reconstruction accuracy over current state-of-the-art methods.
Related papers
- QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization [69.50126552763157]
Surface reconstruction is fundamental to computer vision and graphics, enabling applications in 3D modeling, mixed reality, robotics, and more.<n>Existing approaches based on rendering obtain promising results, but optimize on a per-scene basis, resulting in a slow optimization that can struggle to model textureless regions.<n>We introduce QuickSplat, which learns data-driven priors to generate dense initializations for 2D gaussian splatting optimization of large-scale indoor scenes.
arXiv Detail & Related papers (2025-05-08T18:43:26Z) - HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes [13.214165748862815]
3DGS methods suffer from issues such as excessive memory consumption, slow training times, prolonged partitioning processes, and significant degradation in rendering quality due to the increased data volume.<n>We introduce textbfHUG, a novel approach that enhances data partitioning and reconstruction quality by leveraging a hierarchical neural Gaussian representation.<n>Our method achieves state-of-the-art results on one synthetic dataset and four real-world datasets.
arXiv Detail & Related papers (2025-04-23T10:40:40Z) - SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling [79.56581753856452]
SparseFlex is a novel sparse-structured isosurface representation that enables differentiable mesh reconstruction at resolutions up to $10243$ directly from rendering losses.<n>By enabling high-resolution, differentiable mesh reconstruction and generation with rendering losses, SparseFlex significantly advances the state-of-the-art in 3D shape representation and modeling.
arXiv Detail & Related papers (2025-03-27T17:46:42Z) - SCube: Instant Large-Scale Scene Reconstruction using VoxSplats [55.383993296042526]
We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images.
Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold.
arXiv Detail & Related papers (2024-10-26T00:52:46Z) - VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations [25.88881764546414]
VQ-NeRF is an efficient pipeline for enhancing implicit neural representations via vector quantization.
We present an innovative multi-scale NeRF sampling scheme that concurrently optimize the NeRF model at both compressed and original scales.
We incorporate a semantic loss function to improve the geometric fidelity and semantic coherence of our 3D reconstructions.
arXiv Detail & Related papers (2023-10-23T01:41:38Z) - AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation [51.143540967290114]
We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth computation and estimation.
This is achieved by reversing, or undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame.
arXiv Detail & Related papers (2023-10-15T05:15:45Z) - High-Resolution Volumetric Reconstruction for Clothed Humans [27.900514732877827]
We present a novel method for reconstructing clothed humans from a sparse set of, e.g., 1 to 6 RGB images.
Our method significantly reduces the mean point-to-surface (P2S) precision of state-of-the-art methods by more than 50% to achieve approximately 2mm accuracy with a 512 volume resolution.
arXiv Detail & Related papers (2023-07-25T06:37:50Z) - Incremental Dense Reconstruction from Monocular Video with Guided Sparse
Feature Volume Fusion [23.984073189849024]
This letter proposes a real-time feature volume-based dense reconstruction method that predicts TSDF values from a novel sparsified deep feature volume.
An uncertainty-aware multi-view stereo network is leveraged to infer initial voxel locations of the physical surface in a sparse feature volume.
Our method is shown to produce more complete reconstructions with finer detail in many cases.
arXiv Detail & Related papers (2023-05-24T09:06:01Z) - VolRecon: Volume Rendering of Signed Ray Distance Functions for
Generalizable Multi-View Reconstruction [64.09702079593372]
VolRecon is a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF)
On DTU dataset, VolRecon outperforms SparseNeuS by about 30% in sparse view reconstruction and achieves comparable accuracy as MVSNet in full view reconstruction.
arXiv Detail & Related papers (2022-12-15T18:59:54Z) - MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface
Reconstruction [72.05649682685197]
State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views.
This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints.
Motivated by recent advances in the area of monocular geometry prediction, we explore the utility these cues provide for improving neural implicit surface reconstruction.
arXiv Detail & Related papers (2022-06-01T17:58:15Z) - Monocular Real-Time Volumetric Performance Capture [28.481131687883256]
We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video.
Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu)
We also introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples.
arXiv Detail & Related papers (2020-07-28T04:45:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.