HD$^2$-SSC: High-Dimension High-Density Semantic Scene Completion for Autonomous Driving
- URL: http://arxiv.org/abs/2511.07925v2
- Date: Fri, 14 Nov 2025 01:50:16 GMT
- Title: HD$^2$-SSC: High-Dimension High-Density Semantic Scene Completion for Autonomous Driving
- Authors: Zhiwen Yang, Yuxin Peng,
- Abstract summary: Camera-based 3D semantic scene completion (SSC) plays a crucial role in autonomous driving.<n>Existing SSC methods suffer from the inherent input-output dimension gap and annotation-reality density gap.<n>We propose a corresponding High- Dimension High-Density Semantic Scene Completion framework with expanded pixel semantics and refined voxel occupancies.
- Score: 52.959716866316604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera-based 3D semantic scene completion (SSC) plays a crucial role in autonomous driving, enabling voxelized 3D scene understanding for effective scene perception and decision-making. Existing SSC methods have shown efficacy in improving 3D scene representations, but suffer from the inherent input-output dimension gap and annotation-reality density gap, where the 2D planner view from input images with sparse annotated labels leads to inferior prediction of real-world dense occupancy with a 3D stereoscopic view. In light of this, we propose the corresponding High-Dimension High-Density Semantic Scene Completion (HD$^2$-SSC) framework with expanded pixel semantics and refined voxel occupancies. To bridge the dimension gap, a High-dimension Semantic Decoupling module is designed to expand 2D image features along a pseudo third dimension, decoupling coarse pixel semantics from occlusions, and then identify focal regions with fine semantics to enrich image features. To mitigate the density gap, a High-density Occupancy Refinement module is devised with a "detect-and-refine" architecture to leverage contextual geometric and semantic structures for enhanced semantic density with the completion of missing voxels and correction of erroneous ones. Extensive experiments and analyses on the SemanticKITTI and SSCBench-KITTI-360 datasets validate the effectiveness of our HD$^2$-SSC framework.
Related papers
- Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion [52.959716866316604]
Camera-based 3D semantic scene completion (SSC) offers a cost-effective solution for assessing the geometric occupancy and semantic labels of each voxel in the surrounding 3D scene with image inputs.<n>Existing methods face the challenge of voxel sparsity as a large portion of voxels in autonomous driving scenarios are empty.<n>We propose a textitMulti-Resolution Alignment (MRA) approach to mitigate voxel sparsity in camera-based 3D semantic scene completion.
arXiv Detail & Related papers (2026-02-03T10:46:51Z) - VOIC: Visible-Occluded Decoupling for Monocular 3D Semantic Scene Completion [6.144392125326462]
Camera-based 3D Semantic Scene Completion is a critical task for autonomous driving and robotic scene understanding.<n>Existing methods typically focus on end-to-end 2D-to-3D feature lifting and voxel completion.<n>We propose a novel dual-decoder framework that explicitly decouples SSC into visible-region semantic perception and occluded-region scene completion.
arXiv Detail & Related papers (2025-12-22T02:05:45Z) - Task-Aware 3D Affordance Segmentation via 2D Guidance and Geometric Refinement [12.260126771415019]
We introduce Task-Aware 3D Scene-level Affordance segmentation (TASA)<n>TASA is a novel geometry-optimized framework that jointly leverages 2D semantic cues and 3D geometric reasoning in a coarse-to-fine manner.<n>To fully exploit 3D geometric information, a 3D affordance refinement module is proposed to integrate 2D semantic priors with local 3D geometry.
arXiv Detail & Related papers (2025-11-12T13:36:37Z) - VPOcc: Exploiting Vanishing Point for 3D Semantic Occupancy Prediction [24.947072696837118]
Understanding 3D scenes semantically and spatially is crucial for the safe navigation of robots and autonomous vehicles.<n>Camera-based 3D semantic occupancy prediction infers complete voxel grids from 2D images.<n>This task inherently suffers from a 2D-3D discrepancy, where objects of the same size in 3D space appear at different scales in a 2D image depending on their distance from the camera.<n>We propose a novel framework called VPOcc that leverages a vanishing point (VP) to mitigate the 2D-3D discrepancy at both the pixel and feature levels.
arXiv Detail & Related papers (2024-08-07T05:23:52Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - CLIP-GS: CLIP-Informed Gaussian Splatting for View-Consistent 3D Indoor Semantic Understanding [17.440124130814166]
Exploiting 3D Gaussian Splatting (3DGS) with Contrastive Language-Image Pre-Training (CLIP) models for open-vocabulary 3D semantic understanding of indoor scenes has emerged as an attractive research focus.<n>We present CLIP-GS, efficiently achieving a coherent semantic understanding of 3D indoor scenes via the proposed Semantic Attribute Compactness (SAC) and 3D Coherent Regularization (3DCR)<n>Our method remarkably suppresses existing state-of-the-art approaches, achieving mIoU improvements of 21.20% and 13.05% on ScanNet and Replica datasets, respectively
arXiv Detail & Related papers (2024-04-22T15:01:32Z) - Camera-based 3D Semantic Scene Completion with Sparse Guidance Network [18.415854443539786]
We propose a camera-based semantic scene completion framework called SGN.
SGN propagates semantics from semantic-aware seed voxels to the whole scene based on spatial geometry cues.
Our experimental results demonstrate the superiority of our SGN over existing state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T04:17:27Z) - DepthSSC: Monocular 3D Semantic Scene Completion via Depth-Spatial Alignment and Voxel Adaptation [2.949710700293865]
We propose DepthSSC, an advanced method for semantic scene completion using only monocular cameras.<n> DepthSSC integrates the Spatial Transformation Graph Fusion (ST-GF) module with Geometric-Aware Voxelization (GAV)<n>We show that DepthSSC captures intricate 3D structural details effectively and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-11-28T01:47:51Z) - Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion [45.171150395915056]
3D semantic scene completion (SSC) is an ill-posed perception task that requires inferring a dense 3D scene from limited observations.
Previous camera-based methods struggle to predict accurate semantic scenes due to inherent geometric ambiguity and incomplete observations.
We resort to stereo matching technique and bird's-eye-view (BEV) representation learning to address such issues in SSC.
arXiv Detail & Related papers (2023-03-24T12:33:44Z) - Semantic Scene Completion with Cleaner Self [93.99441599791275]
Semantic Scene Completion (SSC) transforms an image of single-view depth and/or RGB 2D pixels into 3D voxels, each of whose semantic labels are predicted.
SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF)
We use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a "cleaner" SSC model.
As the model is noise-free, it is expected to
arXiv Detail & Related papers (2023-03-17T13:50:18Z) - Stereo Object Matching Network [78.35697025102334]
This paper presents a stereo object matching method that exploits both 2D contextual information from images and 3D object-level information.
We present two novel strategies to handle 3D objectness in the cost volume space: selective sampling (RoISelect) and 2D-3D fusion.
arXiv Detail & Related papers (2021-03-23T12:54:43Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.