SUG-Occ: An Explicit Semantics and Uncertainty Guided Sparse Learning Framework for Real-Time 3D Occupancy Prediction
- URL: http://arxiv.org/abs/2601.11396v3
- Date: Thu, 22 Jan 2026 10:43:59 GMT
- Title: SUG-Occ: An Explicit Semantics and Uncertainty Guided Sparse Learning Framework for Real-Time 3D Occupancy Prediction
- Authors: Hanlin Wu, Pengfei Lin, Ehsan Javanmardi, Naren Bao, Bo Qian, Hao Si, Manabu Tsukada,
- Abstract summary: We propose SUG-Occ, an explicit Semantics and Uncertainty Guided Sparse Learning Enabled 3D Occupancy Prediction Framework.<n>We first utilize semantic and uncertainty priors to suppress projections from free space during view transformation.<n>We then employ an explicit unsigned distance encoding to enhance geometric consistency, producing a structurally consistent sparse 3D representation.
- Score: 5.730573889498275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As autonomous driving moves toward full scene understanding, 3D semantic occupancy prediction has emerged as a crucial perception task, offering voxel-level semantics beyond traditional detection and segmentation paradigms. However, such a refined representation for scene understanding incurs prohibitive computation and memory overhead, posing a major barrier to practical real-time deployment. To address this, we propose SUG-Occ, an explicit Semantics and Uncertainty Guided Sparse Learning Enabled 3D Occupancy Prediction Framework, which exploits the inherent sparsity of 3D scenes to reduce redundant computation while maintaining geometric and semantic completeness. Specifically, we first utilize semantic and uncertainty priors to suppress projections from free space during view transformation while employing an explicit unsigned distance encoding to enhance geometric consistency, producing a structurally consistent sparse 3D representation. Secondly, we design an cascade sparse completion module via hyper cross sparse convolution and generative upsampling to enable efficiently coarse-to-fine reasoning. Finally, we devise an object contextual representation (OCR) based mask decoder that aggregates global semantic context from sparse features and refines voxel-wise predictions via lightweight query-context interactions, avoiding expensive attention operations over volumetric features. Extensive experiments on SemanticKITTI benchmark demonstrate that the proposed approach outperforms the baselines, achieving a 7.34/% improvement in accuracy and a 57.8\% gain in efficiency.
Related papers
- SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation [114.57192386025373]
SegSplat is a novel framework designed to bridge the gap between rapid, feed-forward 3D reconstruction and rich, open-vocabulary semantic understanding.<n>This work represents a significant step towards practical, on-the-fly generation of semantically aware 3D environments.
arXiv Detail & Related papers (2025-11-23T10:26:38Z) - OccLE: Label-Efficient 3D Semantic Occupancy Prediction [68.60633561134571]
OccLE is a Label-Efficient 3D Semantic Occupancy Prediction.<n>It takes images and LiDAR as inputs and maintains high performance with limited voxel annotations.<n> Experiments show that OccLE achieves competitive performance with only 10% of voxel annotations.
arXiv Detail & Related papers (2025-05-27T01:41:28Z) - TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness [13.68631587423815]
3D semantic occupancy has rapidly become a research focus in the fields of robotics and autonomous driving environment perception.<n>Existing occupancy prediction tasks are modeled using voxel or point cloud-based approaches.<n>We propose a dual-modal prediction method based on 3D Gaussian sets and sparse points, which balances both spatial location and volumetric structural information.
arXiv Detail & Related papers (2025-03-13T01:35:04Z) - ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions [91.55655961014027]
3D semantic occupancy and flow prediction are fundamental to understanding scene scene.<n>This paper proposes a vision-based framework with three targeted improvements.<n>Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint semantic-flow prediction.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - CLIP-GS: CLIP-Informed Gaussian Splatting for View-Consistent 3D Indoor Semantic Understanding [17.440124130814166]
Exploiting 3D Gaussian Splatting (3DGS) with Contrastive Language-Image Pre-Training (CLIP) models for open-vocabulary 3D semantic understanding of indoor scenes has emerged as an attractive research focus.<n>We present CLIP-GS, efficiently achieving a coherent semantic understanding of 3D indoor scenes via the proposed Semantic Attribute Compactness (SAC) and 3D Coherent Regularization (3DCR)<n>Our method remarkably suppresses existing state-of-the-art approaches, achieving mIoU improvements of 21.20% and 13.05% on ScanNet and Replica datasets, respectively
arXiv Detail & Related papers (2024-04-22T15:01:32Z) - Volumetric Semantically Consistent 3D Panoptic Mapping [77.13446499924977]
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating semantic 3D maps suitable for autonomous agents in unstructured environments.
It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions.
The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics.
arXiv Detail & Related papers (2023-09-26T08:03:10Z) - S3Net: 3D LiDAR Sparse Semantic Segmentation Network [1.330528227599978]
S3Net is a novel convolutional neural network for LiDAR point cloud semantic segmentation.
It adopts an encoder-decoder backbone that consists of Sparse Intra-channel Attention Module (SIntraAM) and Sparse Inter-channel Attention Module (SInterAM)
arXiv Detail & Related papers (2021-03-15T22:15:24Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.