SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
- URL: http://arxiv.org/abs/2403.14366v1
- Date: Thu, 21 Mar 2024 12:49:32 GMT
- Title: SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
- Authors: Lizhe Liu, Bohua Wang, Hongwei Xie, Daqi Liu, Li Liu, Zhiqiang Tian, Kuiyuan Yang, Bing Wang,
- Abstract summary: We propose SurroundSDF to implicitly predict the signed distance field (SDF) and semantic field for the continuous perception from surround images.
Specifically, we introduce a query-based approach and utilize SDF constrained by the Eikonal formulation to accurately describe the surfaces of obstacles.
Considering the absence of precise SDF ground truth, we propose a novel weakly supervised paradigm for SDF, referred to as the Sandwich Eikonal formulation.
- Score: 18.110716280650514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-centric 3D environment understanding is both vital and challenging for autonomous driving systems. Recently, object-free methods have attracted considerable attention. Such methods perceive the world by predicting the semantics of discrete voxel grids but fail to construct continuous and accurate obstacle surfaces. To this end, in this paper, we propose SurroundSDF to implicitly predict the signed distance field (SDF) and semantic field for the continuous perception from surround images. Specifically, we introduce a query-based approach and utilize SDF constrained by the Eikonal formulation to accurately describe the surfaces of obstacles. Furthermore, considering the absence of precise SDF ground truth, we propose a novel weakly supervised paradigm for SDF, referred to as the Sandwich Eikonal formulation, which emphasizes applying correct and dense constraints on both sides of the surface, thereby enhancing the perceptual accuracy of the surface. Experiments suggest that our method achieves SOTA for both occupancy prediction and 3D scene reconstruction tasks on the nuScenes dataset.
Related papers
- RaNeuS: Ray-adaptive Neural Surface Reconstruction [87.20343320266215]
We leverage a differentiable radiance field eg NeRF to reconstruct detailed 3D surfaces in addition to producing novel view renderings.
Considering that different methods formulate and optimize the projection from SDF to radiance field with a globally constant Eikonal regularization, we improve with a ray-wise weighting factor.
Our proposed textitRaNeuS are extensively evaluated on both synthetic and real datasets.
arXiv Detail & Related papers (2024-06-14T07:54:25Z) - CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting [15.392692128626809]
We propose CARFF, a method for predicting future 3D scenes given past observations.
We employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations.
We demonstrate the utility of our method in scenarios using the CARLA driving simulator.
arXiv Detail & Related papers (2024-01-31T18:56:09Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Camera-based 3D Semantic Scene Completion with Sparse Guidance Network [20.876048262597255]
Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations.
We propose an end-to-end camera-based SSC framework, termed SGN, to diffuse semantics from the semantic- and occupancy-aware seed voxels to the whole scene.
arXiv Detail & Related papers (2023-12-10T04:17:27Z) - S4C: Self-Supervised Semantic Scene Completion with Neural Fields [54.35865716337547]
3D semantic scene understanding is a fundamental challenge in computer vision.
Current methods for SSC are generally trained on 3D ground truth based on aggregated LiDAR scans.
Our work presents the first self-supervised approach to SSC called S4C that does not rely on 3D ground truth data.
arXiv Detail & Related papers (2023-10-11T14:19:05Z) - View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Semantic Scene Completion with Cleaner Self [93.99441599791275]
Semantic Scene Completion (SSC) transforms an image of single-view depth and/or RGB 2D pixels into 3D voxels, each of whose semantic labels are predicted.
SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF)
We use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a "cleaner" SSC model.
As the model is noise-free, it is expected to
arXiv Detail & Related papers (2023-03-17T13:50:18Z) - On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper.
We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation.
Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.