S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point
Clouds
- URL: http://arxiv.org/abs/2012.09242v1
- Date: Wed, 16 Dec 2020 20:14:41 GMT
- Title: S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point
Clouds
- Authors: Ran Cheng, Christopher Agia, Yuan Ren, Xinhai Li, Liu Bingbing
- Abstract summary: We present S3CNet, a sparse convolution based neural network that predicts the semantically completed scene from a single, unified LiDAR point cloud.
We show that our proposed method outperforms all counterparts on the 3D task, achieving state-of-the art results on the Semantic KITTI benchmark.
- Score: 0.16799377888527683
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increasing reliance of self-driving and similar robotic systems on
robust 3D vision, the processing of LiDAR scans with deep convolutional neural
networks has become a trend in academia and industry alike. Prior attempts on
the challenging Semantic Scene Completion task - which entails the inference of
dense 3D structure and associated semantic labels from "sparse" representations
- have been, to a degree, successful in small indoor scenes when provided with
dense point clouds or dense depth maps often fused with semantic segmentation
maps from RGB images. However, the performance of these systems drop
drastically when applied to large outdoor scenes characterized by dynamic and
exponentially sparser conditions. Likewise, processing of the entire sparse
volume becomes infeasible due to memory limitations and workarounds introduce
computational inefficiency as practitioners are forced to divide the overall
volume into multiple equal segments and infer on each individually, rendering
real-time performance impossible. In this work, we formulate a method that
subsumes the sparsity of large-scale environments and present S3CNet, a sparse
convolution based neural network that predicts the semantically completed scene
from a single, unified LiDAR point cloud. We show that our proposed method
outperforms all counterparts on the 3D task, achieving state-of-the art results
on the SemanticKITTI benchmark. Furthermore, we propose a 2D variant of S3CNet
with a multi-view fusion strategy to complement our 3D network, providing
robustness to occlusions and extreme sparsity in distant regions. We conduct
experiments for the 2D semantic scene completion task and compare the results
of our sparse 2D network against several leading LiDAR segmentation models
adapted for bird's eye view segmentation on two open-source datasets.
Related papers
- Dynamic 3D Point Cloud Sequences as 2D Videos [81.46246338686478]
3D point cloud sequences serve as one of the most common and practical representation modalities of real-world environments.
We propose a novel generic representation called textitStructured Point Cloud Videos (SPCVs)
SPCVs re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points.
arXiv Detail & Related papers (2024-03-02T08:18:57Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Towards Label-free Scene Understanding by Vision Foundation Models [87.13117617056004]
We investigate the potential of vision foundation models in enabling networks to comprehend 2D and 3D worlds without labelled data.
We propose a novel Cross-modality Noisy Supervision (CNS) method that leverages the strengths of CLIP and SAM to supervise 2D and 3D networks simultaneously.
Our 2D and 3D network achieves label-free semantic segmentation with 28.4% and 33.5% mIoU on ScanNet, improving 4.7% and 7.9%, respectively.
arXiv Detail & Related papers (2023-06-06T17:57:49Z) - Putting 3D Spatially Sparse Networks on a Diet [21.881294733075393]
We propose a compact weight-sparse and spatially sparse 3D convnet (WS3-ConvNet) for semantic segmentation and instance segmentation.
We employ various network pruning strategies to find compact networks and show our WS3-ConvNet achieves minimal loss in performance (2-15% drop) with orders-of-15% smaller number of parameters (1/100 compression rate)
Finally, we systematically analyze the compression patterns of WS3-ConvNet and show interesting emerging sparsity patterns common in our compressed networks to further speed up inference.
arXiv Detail & Related papers (2021-12-02T15:20:15Z) - Data Augmented 3D Semantic Scene Completion with 2D Segmentation Priors [1.0973642726108543]
We present SPAwN, a novel lightweight multimodal 3D deep CNN.
A crucial difficulty in this field is the lack of fully labeled real-world 3D datasets.
We introduce the use of a 3D data augmentation strategy that can be applied to multimodal SSC networks.
arXiv Detail & Related papers (2021-11-26T04:08:34Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - S3Net: 3D LiDAR Sparse Semantic Segmentation Network [1.330528227599978]
S3Net is a novel convolutional neural network for LiDAR point cloud semantic segmentation.
It adopts an encoder-decoder backbone that consists of Sparse Intra-channel Attention Module (SIntraAM) and Sparse Inter-channel Attention Module (SInterAM)
arXiv Detail & Related papers (2021-03-15T22:15:24Z) - Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point
Clouds of Wild Scenes [36.07733308424772]
The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation.
We propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision.
arXiv Detail & Related papers (2020-04-26T23:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.