Related papers: $α$-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction

$α$-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction

URL: http://arxiv.org/abs/2406.11021v4
Date: Fri, 31 Jan 2025 16:18:56 GMT
Title: $α$-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction
Authors: Sanbao Su, Nuo Chen, Chenchen Lin, Felix Juefei-Xu, Chen Feng, Fei Miao,
Abstract summary: Camera-based 3D Semantic Occupancy Prediction (OCC) aims to infer scene geometry and semantics from limited observations.<n>We first introduce Depth-UP, an uncertainty propagation framework that improves geometry completion by up to 11.58%.<n>For uncertainty (UQ), we propose the hierarchical conformal prediction (HCP) method, effectively handling the high-level class imbalance in OCC datasets.
Score: 32.78977564877008
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the realm of autonomous vehicle perception, comprehending 3D scenes is paramount for tasks such as planning and mapping. Camera-based 3D Semantic Occupancy Prediction (OCC) aims to infer scene geometry and semantics from limited observations. While it has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To address this, we propose an uncertainty-aware OCC method ($\alpha$-OCC). We first introduce Depth-UP, an uncertainty propagation framework that improves geometry completion by up to 11.58\% and semantic segmentation by up to 12.95\% across various OCC models. For uncertainty quantification (UQ), we propose the hierarchical conformal prediction (HCP) method, effectively handling the high-level class imbalance in OCC datasets. On the geometry level, the novel KL-based score function significantly improves the occupied recall (45\%) of safety-critical classes with minimal performance overhead (3.4\% reduction). On UQ, our HCP achieves smaller prediction set sizes while maintaining the defined coverage guarantee. Compared with baselines, it reduces up to 92\% set size, with 18\% further reduction when integrated with Depth-UP. Our contributions advance OCC accuracy and robustness, marking a noteworthy step forward in autonomous perception systems.

Related papers

Learnable Conformal Prediction with Context-Aware Nonconformity Functions for Robotic Planning and Perception [4.694504497452662]
Learnable Conformal Prediction replaces fixed scores with a lightweight neural function to produce context-aware uncertainty sets.<n>It maintains CP's theoretical guarantees while reducing prediction set sizes by 18% in classification, tightening detection intervals by 52%, and improving path planning safety from 72% to 91% success with minimal overhead.<n> Hardware evaluation shows LCP adds less than 1% memory and 15.9% inference overhead, yet sustains 39 FPS on detection tasks while being 7.4 times more energy-efficient than ensembles.
arXiv Detail & Related papers (2025-09-26T06:44:58Z)
One Step Closer: Creating the Future to Boost Monocular Semantic Scene Completion [3.664655957801223]
In real-world traffic scenarios, a significant portion of a visual 3D scene remains occluded or outside the camera's field of view.<n>We propose Creating the Future SSC, a novel temporal SSC framework that leverages pseudo-future frame prediction to expand the model's effective perceptual range.<n>Our approach combines poses and depths to establish accurate 3D correspondences, enabling geometrically-consistent fusion of past, present, and predicted future frames in 3D space.
arXiv Detail & Related papers (2025-07-18T10:24:58Z)
Zero-shot Inexact CAD Model Alignment from a Single Image [53.37898107159792]
A practical approach to infer 3D scene structure from a single image is to retrieve a closely matching 3D model from a database and align it with the object in the image.<n>Existing methods rely on supervised training with images and pose annotations, which limits them to a narrow set of object categories.<n>We propose a weakly supervised 9-DoF alignment method for inexact 3D models that requires no pose annotations and generalizes to unseen categories.
arXiv Detail & Related papers (2025-07-04T04:46:59Z)
DSOcc: Leveraging Depth Awareness and Semantic Aid to Boost Camera-Based 3D Semantic Occupancy Prediction [51.42817309112156]
We propose leveraging Depth awareness and Semantic aid to boost camera-based 3D semantic Occupancy prediction (DSOcc)<n>We jointly perform occupancy state and occupancy class inference, where soft occupancy confidence is calculated by non-learning method and multiplied with image features to make voxels aware of depth.<n>Instead of enhancing feature learning, we directly utilize well-trained image semantic segmentation and fuse multiple frames with their occupancy probabilities to aid occupancy class inference.
arXiv Detail & Related papers (2025-05-27T09:45:00Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction [5.285847977231642]
3D semantic occupancy prediction is crucial for ensuring the safety in autonomous driving. Existing fusion-based occupancy methods typically involve performing a 2D-to-3D view transformation on image features. We propose OccLoff, a framework that Learns to optimize Feature Fusion for 3D occupancy prediction.
arXiv Detail & Related papers (2024-11-06T06:34:27Z)
ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera [53.20087549782785]
We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera. Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions.
arXiv Detail & Related papers (2024-10-14T19:14:49Z)
SGCCNet: Single-Stage 3D Object Detector With Saliency-Guided Data Augmentation and Confidence Correction Mechanism [7.631190617438259]
Single-stage point-based 3D object detectors face challenges such as inadequate learning of low-quality objects (ILQ) and misalignment between localization accuracy and classification confidence (MLC) For ILQ, SGCCNet adopts a Saliency-Guided Data Augmentation (SGDA) strategy to enhance the robustness of the model on low-quality objects. For MLC, we design a Confidence Correction Mechanism ( CCM) specifically for point-based multi-class detectors.
arXiv Detail & Related papers (2024-07-01T12:36:01Z)
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation. It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z)
Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding [55.32861154245772]
Calib3D is a pioneering effort to benchmark and scrutinize the reliability of 3D scene understanding models. We evaluate 28 state-of-the-art models across 10 diverse 3D datasets. We introduce DeptS, a novel depth-aware scaling approach aimed at enhancing 3D model calibration.
arXiv Detail & Related papers (2024-03-25T17:59:59Z)
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness [38.802781781863196]
Panoptic Scene Completion (PSC) task extends the popular Semantic Scene Completion (SSC) task with instance-level information. Our PSC proposal utilizes a hybrid mask-based technique on the non-empty voxels from sparse multi-scale completions. Our method surpasses all baselines in both Panoptic Scene Completion and uncertainty estimation on three large-scale autonomous driving datasets.
arXiv Detail & Related papers (2023-12-04T18:59:59Z)
COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction [60.87168562615171]
The autonomous driving community has shown significant interest in 3D occupancy prediction. We propose Compact Occupancy TRansformer (COTR) with a geometry-aware occupancy encoder and a semantic-aware group decoder. COTR outperforms baselines with a relative improvement of 8%-15%.
arXiv Detail & Related papers (2023-12-04T14:23:18Z)
On the Calibration of Human Pose Estimation [39.15814732856338]
Calibrated ConfidenceNet (CCNet) is a light-weight post-hoc addition that improves AP by up to 1.4% on off-the-shelf pose estimation frameworks. applied to the downstream task of mesh recovery, CCNet facilitates an additional 1.0mm decrease in 3D keypoint error.
arXiv Detail & Related papers (2023-11-28T09:31:09Z)
Enhancing Few-shot CLIP with Semantic-Aware Fine-Tuning [61.902254546858465]
Methods based on Contrastive Language-Image Pre-training have exhibited promising performance in few-shot adaptation tasks. We propose fine-tuning the parameters of the attention pooling layer during the training process to encourage the model to focus on task-specific semantics.
arXiv Detail & Related papers (2023-11-08T05:18:57Z)
CPPF++: Uncertainty-Aware Sim2Real Object Pose Estimation by Vote Aggregation [67.12857074801731]
We introduce a novel method, CPPF++, designed for sim-to-real pose estimation. To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty. We incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a feature ensemble.
arXiv Detail & Related papers (2022-11-24T03:27:00Z)
Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.