Related papers: SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field

SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field

URL: http://arxiv.org/abs/2403.14366v1
Date: Thu, 21 Mar 2024 12:49:32 GMT
Title: SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
Authors: Lizhe Liu, Bohua Wang, Hongwei Xie, Daqi Liu, Li Liu, Zhiqiang Tian, Kuiyuan Yang, Bing Wang,
Abstract summary: We propose SurroundSDF to implicitly predict the signed distance field (SDF) and semantic field for the continuous perception from surround images. Specifically, we introduce a query-based approach and utilize SDF constrained by the Eikonal formulation to accurately describe the surfaces of obstacles. Considering the absence of precise SDF ground truth, we propose a novel weakly supervised paradigm for SDF, referred to as the Sandwich Eikonal formulation.
Score: 18.110716280650514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-centric 3D environment understanding is both vital and challenging for autonomous driving systems. Recently, object-free methods have attracted considerable attention. Such methods perceive the world by predicting the semantics of discrete voxel grids but fail to construct continuous and accurate obstacle surfaces. To this end, in this paper, we propose SurroundSDF to implicitly predict the signed distance field (SDF) and semantic field for the continuous perception from surround images. Specifically, we introduce a query-based approach and utilize SDF constrained by the Eikonal formulation to accurately describe the surfaces of obstacles. Furthermore, considering the absence of precise SDF ground truth, we propose a novel weakly supervised paradigm for SDF, referred to as the Sandwich Eikonal formulation, which emphasizes applying correct and dense constraints on both sides of the surface, thereby enhancing the perceptual accuracy of the surface. Experiments suggest that our method achieves SOTA for both occupancy prediction and 3D scene reconstruction tasks on the nuScenes dataset.

Related papers

One Step Closer: Creating the Future to Boost Monocular Semantic Scene Completion [3.664655957801223]
In real-world traffic scenarios, a significant portion of a visual 3D scene remains occluded or outside the camera's field of view.<n>We propose Creating the Future SSC, a novel temporal SSC framework that leverages pseudo-future frame prediction to expand the model's effective perceptual range.<n>Our approach combines poses and depths to establish accurate 3D correspondences, enabling geometrically-consistent fusion of past, present, and predicted future frames in 3D space.
arXiv Detail & Related papers (2025-07-18T10:24:58Z)
Few-Shot Unsupervised Implicit Neural Shape Representation Learning with Spatial Adversaries [8.732260277121547]
Implicit Neural Representations have gained prominence as a powerful framework for capturing complex data modalities. Within the realm of 3D shape representation, Neural Signed Distance Functions (SDF) have demonstrated remarkable potential in faithfully encoding intricate shape geometry.
arXiv Detail & Related papers (2024-08-27T14:54:33Z)
RaNeuS: Ray-adaptive Neural Surface Reconstruction [87.20343320266215]
We leverage a differentiable radiance field eg NeRF to reconstruct detailed 3D surfaces in addition to producing novel view renderings. Considering that different methods formulate and optimize the projection from SDF to radiance field with a globally constant Eikonal regularization, we improve with a ray-wise weighting factor. Our proposed textitRaNeuS are extensively evaluated on both synthetic and real datasets.
arXiv Detail & Related papers (2024-06-14T07:54:25Z)
CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting [15.392692128626809]
We propose CARFF, a method for predicting future 3D scenes given past observations. We employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations. We demonstrate the utility of our method in scenarios using the CARLA driving simulator.
arXiv Detail & Related papers (2024-01-31T18:56:09Z)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision. We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z)
S4C: Self-Supervised Semantic Scene Completion with Neural Fields [54.35865716337547]
3D semantic scene understanding is a fundamental challenge in computer vision. Current methods for SSC are generally trained on 3D ground truth based on aggregated LiDAR scans. Our work presents the first self-supervised approach to SSC called S4C that does not rely on 3D ground truth data.
arXiv Detail & Related papers (2023-10-11T14:19:05Z)
View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics. The proposed method addresses limitations in existing cross-view localization methods. It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z)
Semantic Scene Completion with Cleaner Self [93.99441599791275]
Semantic Scene Completion (SSC) transforms an image of single-view depth and/or RGB 2D pixels into 3D voxels, each of whose semantic labels are predicted. SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF) We use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a "cleaner" SSC model. As the model is noise-free, it is expected to
arXiv Detail & Related papers (2023-03-17T13:50:18Z)
On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper. We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.