Related papers: SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution

SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution

URL: http://arxiv.org/abs/2403.10344v3
Date: Wed, 09 Oct 2024 10:52:15 GMT
Title: SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution
Authors: Hala Djeghim, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Désiré Sidibé,
Abstract summary: SCILLA is a new hybrid implicit surface learning method to reconstruct large driving scenes from 2D images. We show that SCILLA can learn an accurate and detailed 3D surface scene representation in various urban scenarios.
Score: 4.216707699421813
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Neural implicit surface representation methods have recently shown impressive 3D reconstruction results. However, existing solutions struggle to reconstruct urban outdoor scenes due to their large, unbounded, and highly detailed nature. Hence, to achieve accurate reconstructions, additional supervision data such as LiDAR, strong geometric priors, and long training times are required. To tackle such issues, we present SCILLA, a new hybrid implicit surface learning method to reconstruct large driving scenes from 2D images. SCILLA's hybrid architecture models two separate implicit fields: one for the volumetric density and another for the signed distance to the surface. To accurately represent urban outdoor scenarios, we introduce a novel volume-rendering strategy that relies on self-supervised probabilistic density estimation to sample points near the surface and transition progressively from volumetric to surface representation. Our solution permits a proper and fast initialization of the signed distance field without relying on any geometric prior on the scene, compared to concurrent methods. By conducting extensive experiments on four outdoor driving datasets, we show that SCILLA can learn an accurate and detailed 3D surface scene representation in various urban scenarios while being two times faster to train compared to previous state-of-the-art solutions.

Related papers

Geometric Prior-Guided Neural Implicit Surface Reconstruction in the Wild [13.109693095684921]
We introduce a novel approach that applies multiple geometric constraints to the implicit surface optimization process.<n>First, we utilize sparse 3D points from structure-from-motion (SfM) to refine the signed distance function estimation for the reconstructed surface.<n>We also employ robust normal priors derived from a normal predictor, enhanced by edge prior filtering and multi-view consistency constraints.
arXiv Detail & Related papers (2025-05-12T09:17:30Z)
LinPrim: Linear Primitives for Differentiable Volumetric Rendering [53.780682194322225]
We introduce two new scene representations based on linear primitives. We present a different octaiableizer that runs efficiently on GPU. We demonstrate comparable performance to state-of-the-art methods.
arXiv Detail & Related papers (2025-01-27T18:49:38Z)
StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting [85.67616000086232]
StreetSurfGS is first method to employ Gaussian Splatting specifically tailored for scalable urban street scene surface reconstruction. StreetSurfGS utilizes a planar-based octree representation and segmented training to reduce memory costs, accommodate unique camera characteristics, and ensure scalability. To address sparse views and multi-scale challenges, we use a dual-step matching strategy that leverages adjacent and long-term information.
arXiv Detail & Related papers (2024-10-06T04:21:59Z)
GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction [71.08607897266045]
3D Gaussian Splatting (3DGS) has shown promising performance in novel view synthesis. We make the first attempt to tackle the challenging task of large-scale scene surface reconstruction. We propose GigaGS, the first work for high-quality surface reconstruction for large-scale scenes using 3DGS.
arXiv Detail & Related papers (2024-09-10T17:51:39Z)
Spurfies: Sparse Surface Reconstruction using Local Geometry Priors [8.260048622127913]
We introduce Spurfies, a novel method for sparse-view surface reconstruction. It disentangles appearance and geometry information to utilize local geometry priors trained on synthetic data. We validate our method on the DTU dataset and demonstrate that it outperforms previous state of the art by 35% in surface quality.
arXiv Detail & Related papers (2024-08-29T14:02:47Z)
Efficient Depth-Guided Urban View Synthesis [52.841803876653465]
We introduce Efficient Depth-Guided Urban View Synthesis (EDUS) for fast feed-forward inference and efficient per-scene fine-tuning. EDUS exploits noisy predicted geometric priors as guidance to enable generalizable urban view synthesis from sparse input images. Our results indicate that EDUS achieves state-of-the-art performance in sparse view settings when combined with fast test-time optimization.
arXiv Detail & Related papers (2024-07-17T08:16:25Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views [9.175560202201819]
3D Gaussian Splatting (3DGS) has emerged as an efficient approach for accurately representing scenes. We propose a novel approach for bridging the gap between the noisy 3DGS representation and the smooth 3D mesh representation. We render stereo-aligned pairs of images corresponding to the original training poses, feed the pairs into a stereo model to get a depth profile, and finally fuse all of the profiles together to get a single mesh.
arXiv Detail & Related papers (2024-04-02T10:13:18Z)
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion [77.34078223594686]
We propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques. Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner. Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.
arXiv Detail & Related papers (2024-01-19T16:15:37Z)
NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion [56.98287481620215]
We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured. Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to complete the shape in unobserved regions in a plausible manner.
arXiv Detail & Related papers (2023-12-07T19:30:55Z)
Improving Neural Indoor Surface Reconstruction with Mask-Guided Adaptive Consistency Constraints [0.6749750044497732]
We propose a two-stage training process, decouple view-dependent and view-independent colors, and leverage two novel consistency constraints to enhance detail reconstruction performance without requiring extra priors. Experiments on synthetic and real-world datasets show the capability of reducing the interference from prior estimation errors.
arXiv Detail & Related papers (2023-09-18T13:05:23Z)
StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views [6.35910814268525]
We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf. It is readily applicable to street view images in widely-used autonomous driving datasets, without necessarily requiring LiDAR data. We achieve state of the art reconstruction quality in both geometry and appearance within only one to two hours of training time.
arXiv Detail & Related papers (2023-06-08T07:19:27Z)
DiViNeT: 3D Reconstruction from Disparate Views via Neural Template Regularization [7.488962492863031]
We present a volume rendering-based neural surface reconstruction method that takes as few as three disparate RGB images as input. Our key idea is to regularize the reconstruction, which is severely ill-posed and leaving significant gaps between the sparse views. Our approach achieves the best reconstruction quality among existing methods in the presence of such sparse views.
arXiv Detail & Related papers (2023-06-07T18:05:14Z)
Recovering Fine Details for Neural Implicit Surface Reconstruction [3.9702081347126943]
We present D-NeuS, a volume rendering neural implicit surface reconstruction method capable to recover fine geometry details. We impose multi-view feature consistency on the surface points, derived by interpolating SDF zero-crossings from sampled points along rays. Our method reconstructs high-accuracy surfaces with details, and outperforms the state of the art.
arXiv Detail & Related papers (2022-11-21T10:06:09Z)
Unbiased 4D: Monocular 4D Reconstruction with a Neural Deformation Model [76.64071133839862]
Capturing general deforming scenes from monocular RGB video is crucial for many computer graphics and vision applications. Our method, Ub4D, handles large deformations, performs shape completion in occluded regions, and can operate on monocular RGB videos directly by using differentiable volume rendering. Results on our new dataset, which will be made publicly available, demonstrate a clear improvement over the state of the art in terms of surface reconstruction accuracy and robustness to large deformations.
arXiv Detail & Related papers (2022-06-16T17:59:54Z)
Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos [0.5276232626689566]
We present DLNet for pixel-wise depth estimation, which simultaneously extracts global and local features. A three-dimensional geometry smoothness loss is proposed to predict a geometrically natural depth map. In experiments on the KITTI and Make3D benchmarks, the proposed DLNet achieves performance competitive to those of the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-07T10:53:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.