OccLE: Label-Efficient 3D Semantic Occupancy Prediction
- URL: http://arxiv.org/abs/2505.20617v2
- Date: Wed, 06 Aug 2025 13:55:08 GMT
- Title: OccLE: Label-Efficient 3D Semantic Occupancy Prediction
- Authors: Naiyu Fang, Zheyuan Zhou, Fayao Liu, Xulei Yang, Jiacheng Wei, Lemiao Qiu, Guosheng Lin,
- Abstract summary: 3D semantic occupancy prediction offers an intuitive and efficient scene understanding.<n>Existing approaches either rely on full supervision, or on self-supervision, which provides limited guidance and yields suboptimal performance.<n>We propose OccLE, a Label-Efficient 3D Semantic Occupancy Prediction that takes images and LiDAR as inputs and maintains high performance with limited voxel annotations.
- Score: 48.50138308129873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D semantic occupancy prediction offers an intuitive and efficient scene understanding and has attracted significant interest in autonomous driving perception. Existing approaches either rely on full supervision, which demands costly voxel-level annotations, or on self-supervision, which provides limited guidance and yields suboptimal performance. To address these challenges, we propose OccLE, a Label-Efficient 3D Semantic Occupancy Prediction that takes images and LiDAR as inputs and maintains high performance with limited voxel annotations. Our intuition is to decouple the semantic and geometric learning tasks and then fuse the learned feature grids from both tasks for the final semantic occupancy prediction. Therefore, the semantic branch distills 2D foundation model to provide aligned pseudo labels for 2D and 3D semantic learning. The geometric branch integrates image and LiDAR inputs in cross-plane synergy based on their inherency, employing semi-supervision to enhance geometry learning. We fuse semantic-geometric feature grids through Dual Mamba and incorporate a scatter-accumulated projection to supervise unannotated prediction with aligned pseudo labels. Experiments show that OccLE achieves competitive performance with only 10\% of voxel annotations on the SemanticKITTI and Occ3D-nuScenes datasets.
Related papers
- From Binary to Semantic: Utilizing Large-Scale Binary Occupancy Data for 3D Semantic Occupancy Prediction [0.0]
We propose a novel binary occupancy-based framework that decomposes the prediction process into binary and semantic occupancy modules.<n>Our experimental results demonstrate that the proposed framework outperforms existing methods in both pre-training and auto-labeling tasks.
arXiv Detail & Related papers (2025-07-16T01:57:16Z) - Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels [69.58063088519852]
We propose to improve semantic correspondence estimation via 3D-aware pseudo-labeling.<n>Specifically, we train an adapter to refine off-the-shelf features using pseudo-labels obtained via 3D-aware chaining.<n>While reducing the need for dataset specific annotations, we set a new state-of-the-art on SPair-71k by over 4% absolute gain.
arXiv Detail & Related papers (2025-06-05T17:54:33Z) - VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection [67.09867723723934]
3D semantic occupancy prediction aims to reconstruct the 3D geometry and semantics of the surrounding environment.<n>With dense voxel labels, prior works typically formulate it as a dense segmentation task, independently classifying each voxel.<n>We propose VoxDet, an instance-centric framework that reformulates the voxel-level occupancy prediction as dense object detection.
arXiv Detail & Related papers (2025-06-05T04:31:55Z) - AGO: Adaptive Grounding for Open World 3D Occupancy Prediction [11.607246562535366]
Open-world 3D semantic occupancy prediction aims to generate a voxelized 3D representation from sensor inputs.<n>We propose AGO, a novel 3D occupancy prediction framework with adaptive grounding to handle diverse open-world scenarios.
arXiv Detail & Related papers (2025-04-14T11:26:20Z) - MinkOcc: Towards real-time label-efficient semantic occupancy prediction [8.239334282982623]
MinkOcc is a multi-modal 3D semantic occupancy prediction framework for cameras and LiDARs.<n>It reduces reliance on manual labeling by 90% while maintaining competitive accuracy.<n>We aim to extend MinkOcc beyond curated datasets, enabling broader real-world deployment of 3D semantic occupancy prediction in autonomous driving.
arXiv Detail & Related papers (2025-04-03T04:31:56Z) - Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance [8.07701188057789]
We introduce a novel semi-supervised framework to alleviate the dependency on densely annotated data.<n>Our approach leverages 2D foundation models to generate essential 3D scene geometric and semantic cues.<n>Our method achieves up to 85% of the fully-supervised performance using only 10% labeled data.
arXiv Detail & Related papers (2024-08-21T12:13:18Z) - Label-efficient Semantic Scene Completion with Scribble Annotations [29.88371368606911]
We build a new label-efficient benchmark, named ScribbleSC, where the sparse scribble-based semantic labels are combined with dense geometric labels for semantic scene completion.
Our method consists of geometric-aware auto-labelers construction and online model training with an offline-to-online distillation module to enhance the performance.
arXiv Detail & Related papers (2024-05-24T03:09:50Z) - 2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic
Segmentation [92.17700318483745]
We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network.
IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points.
arXiv Detail & Related papers (2023-11-27T07:57:29Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric
and Semantic Rendering [27.712689811093362]
We present our solution, named UniOCC, for the Vision-Centric 3D occupancy prediction track.
Our solution achieves 51.27% mIoU on the official leaderboard with single model, placing 3rd in this challenge.
arXiv Detail & Related papers (2023-06-15T13:23:57Z) - VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic
Scene Graph Prediction in Point Cloud [51.063494002003154]
3D semantic scene graph (3DSSG) prediction in the point cloud is challenging since the 3D point cloud only captures geometric structures with limited semantics compared to 2D images.
We propose Visual-Linguistic Semantics Assisted Training scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations.
arXiv Detail & Related papers (2023-03-25T09:14:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.