S4C: Self-Supervised Semantic Scene Completion with Neural Fields
- URL: http://arxiv.org/abs/2310.07522v2
- Date: Thu, 12 Oct 2023 08:02:18 GMT
- Title: S4C: Self-Supervised Semantic Scene Completion with Neural Fields
- Authors: Adrian Hayler, Felix Wimbauer, Dominik Muhle, Christian Rupprecht,
Daniel Cremers
- Abstract summary: 3D semantic scene understanding is a fundamental challenge in computer vision.
Current methods for SSC are generally trained on 3D ground truth based on aggregated LiDAR scans.
Our work presents the first self-supervised approach to SSC called S4C that does not rely on 3D ground truth data.
- Score: 54.35865716337547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D semantic scene understanding is a fundamental challenge in computer
vision. It enables mobile agents to autonomously plan and navigate arbitrary
environments. SSC formalizes this challenge as jointly estimating dense
geometry and semantic information from sparse observations of a scene. Current
methods for SSC are generally trained on 3D ground truth based on aggregated
LiDAR scans. This process relies on special sensors and annotation by hand
which are costly and do not scale well. To overcome this issue, our work
presents the first self-supervised approach to SSC called S4C that does not
rely on 3D ground truth data. Our proposed method can reconstruct a scene from
a single image and only relies on videos and pseudo segmentation ground truth
generated from off-the-shelf image segmentation network during training. Unlike
existing methods, which use discrete voxel grids, we represent scenes as
implicit semantic fields. This formulation allows querying any point within the
camera frustum for occupancy and semantic class. Our architecture is trained
through rendering-based self-supervised losses. Nonetheless, our method
achieves performance close to fully supervised state-of-the-art methods.
Additionally, our method demonstrates strong generalization capabilities and
can synthesize accurate segmentation maps for far away viewpoints.
Related papers
- LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering [0.5852077003870417]
LangOcc is a novel approach for open vocabulary occupancy estimation.
It is trained only via camera images, and can detect arbitrary semantics via vision-language alignment.
We achieve state-of-the-art results in self-supervised semantic occupancy estimation on the Occ3D-nuScenes dataset.
arXiv Detail & Related papers (2024-07-24T14:22:55Z) - Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning [119.99066522299309]
KYN is a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density.
We show that KYN improves 3D shape recovery compared to predicting density for each 3D point in isolation.
We achieve state-of-the-art results in scene and object reconstruction on KITTI-360, and show improved zero-shot generalization compared to prior work.
arXiv Detail & Related papers (2024-04-04T17:59:59Z) - Camera-based 3D Semantic Scene Completion with Sparse Guidance Network [18.415854443539786]
We propose a camera-based semantic scene completion framework called SGN.
SGN propagates semantics from semantic-aware seed voxels to the whole scene based on spatial geometry cues.
Our experimental results demonstrate the superiority of our SGN over existing state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T04:17:27Z) - U3DS$^3$: Unsupervised 3D Semantic Scene Segmentation [19.706172244951116]
This paper presents U3DS$3$, as a step towards completely unsupervised point cloud segmentation for any holistic 3D scenes.
The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene.
We then undergo a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids.
arXiv Detail & Related papers (2023-11-10T12:05:35Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - OpenScene: 3D Scene Understanding with Open Vocabularies [73.1411930820683]
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision.
We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space.
This zero-shot approach enables task-agnostic training and open-vocabulary queries.
arXiv Detail & Related papers (2022-11-28T18:58:36Z) - NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of
3D Scenes [25.26518805603798]
NeSF is a method for producing 3D semantic fields from posed RGB images alone.
Our method is the first to offer truly dense 3D scene segmentations requiring only 2D supervision for training.
arXiv Detail & Related papers (2021-11-25T21:44:54Z) - Semantic Scene Completion using Local Deep Implicit Functions on LiDAR
Data [4.355440821669468]
We propose a scene segmentation network based on local Deep Implicit Functions as a novel learning-based method for scene completion.
We show that this continuous representation is suitable to encode geometric and semantic properties of extensive outdoor scenes without the need for spatial discretization.
Our experiments verify that our method generates a powerful representation that can be decoded into a dense 3D description of a given scene.
arXiv Detail & Related papers (2020-11-18T07:39:13Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z) - Depth Based Semantic Scene Completion with Position Importance Aware
Loss [52.06051681324545]
PALNet is a novel hybrid network for semantic scene completion.
It extracts both 2D and 3D features from multi-stages using fine-grained depth information.
It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
arXiv Detail & Related papers (2020-01-29T07:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.