SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street
Views
- URL: http://arxiv.org/abs/2306.09001v2
- Date: Sat, 30 Sep 2023 01:50:38 GMT
- Title: SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street
Views
- Authors: Yiming Li, Sihang Li, Xinhao Liu, Moonjun Gong, Kenan Li, Nuo Chen,
Zijun Wang, Zhiheng Li, Tao Jiang, Fisher Yu, Yue Wang, Hang Zhao, Zhiding
Yu, Chen Feng
- Abstract summary: SSCBench is a benchmark that integrates scenes from widely used automotive datasets.
We benchmark models using monocular, trinocular, and cloud input to assess the performance gap.
We have unified semantic labels across diverse datasets to simplify cross-domain generalization testing.
- Score: 89.8436375840446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular scene understanding is a foundational component of autonomous
systems. Within the spectrum of monocular perception topics, one crucial and
useful task for holistic 3D scene understanding is semantic scene completion
(SSC), which jointly completes semantic information and geometric details from
RGB input. However, progress in SSC, particularly in large-scale street views,
is hindered by the scarcity of high-quality datasets. To address this issue, we
introduce SSCBench, a comprehensive benchmark that integrates scenes from
widely used automotive datasets (e.g., KITTI-360, nuScenes, and Waymo).
SSCBench follows an established setup and format in the community, facilitating
the easy exploration of SSC methods in various street views. We benchmark
models using monocular, trinocular, and point cloud input to assess the
performance gap resulting from sensor coverage and modality. Moreover, we have
unified semantic labels across diverse datasets to simplify cross-domain
generalization testing. We commit to including more datasets and SSC models to
drive further advancements in this field.
Related papers
- MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics [41.94295877935867]
We study the impact of leveraging a multi-camera setup and integrating diverse data sources for multimodal place recognition.
Our proposed method named MSSPlace utilizes images from multiple cameras, LiDAR point clouds, semantic segmentation masks, and text annotations to generate comprehensive place descriptors.
arXiv Detail & Related papers (2024-07-22T14:24:56Z) - Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations [55.022519020409405]
This paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan.
The resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks.
arXiv Detail & Related papers (2024-06-13T17:59:30Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - Camera-based 3D Semantic Scene Completion with Sparse Guidance Network [20.876048262597255]
Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations.
We propose an end-to-end camera-based SSC framework, termed SGN, to diffuse semantics from the semantic- and occupancy-aware seed voxels to the whole scene.
arXiv Detail & Related papers (2023-12-10T04:17:27Z) - SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation
Separation and BEV Fusion [17.459062337718677]
We propose to solve outdoor SSC from the perspective of representation separation and BEV fusion.
We present the network, named SSC-RS, which uses separate branches with deep supervision to explicitly disentangle the learning procedure of the semantic and geometric representations.
A BEV fusion network equipped with the proposed Adaptive Representation Fusion (ARF) module is presented to aggregate the multi-scale features effectively and efficiently.
arXiv Detail & Related papers (2023-06-27T10:02:45Z) - MASS: Multi-Attentional Semantic Segmentation of LiDAR Data for Dense
Top-View Understanding [27.867824780748606]
We introduce MASS - a Multi-Attentional Semantic model for dense top-view understanding of driving scenes.
Our framework operates on pillar- and occupancy features and comprises three attention-based building blocks.
Our model is shown to be very effective for 3D object detection validated on the KITTI-3D dataset.
arXiv Detail & Related papers (2021-07-01T10:19:32Z) - Semantic Segmentation on Swiss3DCities: A Benchmark Study on Aerial
Photogrammetric 3D Pointcloud Dataset [67.44497676652173]
We introduce a new outdoor urban 3D pointcloud dataset, covering a total area of 2.7 $km2$, sampled from three Swiss cities.
The dataset is manually annotated for semantic segmentation with per-point labels, and is built using photogrammetry from images acquired by multirotors equipped with high-resolution cameras.
arXiv Detail & Related papers (2020-12-23T21:48:47Z) - Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical
Understanding of Outdoor Scene [76.4183572058063]
We present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks.
The dataset has been point-wisely annotated with both hierarchical and instance-based labels.
We formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies.
arXiv Detail & Related papers (2020-08-11T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.