GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction
- URL: http://arxiv.org/abs/2601.22729v1
- Date: Fri, 30 Jan 2026 09:05:30 GMT
- Title: GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction
- Authors: A. Enes Doruk, Hasan F. Ates,
- Abstract summary: We present a memory-efficient, continuous 3D Gaussian representation framework for semantic occupancy prediction.<n>GaussianOcc3D exhibits superior robustness across challenging rainy and nighttime conditions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D semantic occupancy prediction is a pivotal task in autonomous driving, providing a dense and fine-grained understanding of the surrounding environment, yet single-modality methods face trade-offs between camera semantics and LiDAR geometry. Existing multi-modal frameworks often struggle with modality heterogeneity, spatial misalignment, and the representation crisis--where voxels are computationally heavy and BEV alternatives are lossy. We present GaussianOcc3D, a multi-modal framework bridging camera and LiDAR through a memory-efficient, continuous 3D Gaussian representation. We introduce four modules: (1) LiDAR Depth Feature Aggregation (LDFA), using depth-wise deformable sampling to lift sparse signals onto Gaussian primitives; (2) Entropy-Based Feature Smoothing (EBFS) to mitigate domain noise; (3) Adaptive Camera-LiDAR Fusion (ACLF) with uncertainty-aware reweighting for sensor reliability; and (4) a Gauss-Mamba Head leveraging Selective State Space Models for global context with linear complexity. Evaluations on Occ3D, SurroundOcc, and SemanticKITTI benchmarks demonstrate state-of-the-art performance, achieving mIoU scores of 49.4%, 28.9%, and 25.2% respectively. GaussianOcc3D exhibits superior robustness across challenging rainy and nighttime conditions.
Related papers
- Gaussian Based Adaptive Multi-Modal 3D Semantic Occupancy Prediction [0.0]
This research work enhances a novel adaptive camera-LiDAR multimodal 3D occupancy prediction model.<n>It seamlessly bridges the semantic strengths of camera modality with the geometric strengths of LiDAR modality.
arXiv Detail & Related papers (2026-01-20T20:11:09Z) - ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding [7.610505486431266]
We introduce ShelfGaussian, an open-vocabulary multi-modal Gaussian-based 3D scene understanding framework supervised by off-the-shelf vision foundation models.<n>Existing methods either model objects as closed-set semantic Gaussians supervised by annotated 3D labels, neglecting their rendering ability, or learn open-set Gaussian representations via purely 2D self-supervision.
arXiv Detail & Related papers (2025-12-03T02:06:09Z) - GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering [50.675710727721786]
We propose GauSSmart, a hybrid method that bridges 2D foundational models and 3D Gaussian Splatting reconstruction.<n>Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision.<n>We validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting.
arXiv Detail & Related papers (2025-10-16T03:38:26Z) - D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction [73.61056394880733]
3D Gaussian Splatting (3DGS) enables real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations.<n>We identify two key failure modes under sparse-view conditions: overfitting in regions with excessive Gaussian density near the camera, and underfitting in distant areas with insufficient Gaussian coverage.<n>We propose a unified framework D$2$GS, comprising two key components: a Depth-and-Density Guided Dropout strategy, and a Distance-Aware Fidelity Enhancement module.
arXiv Detail & Related papers (2025-10-09T17:59:49Z) - Metropolis-Hastings Sampling for 3D Gaussian Reconstruction [31.840492077537018]
We propose an adaptive sampling framework for 3D Gaussian Splatting (3DGS)<n>Our framework overcomes limitations by reformulating densification and pruning as a probabilistic sampling process.<n>Our approach achieves faster convergence while matching or modestly surpassing the view-synthesis quality of state-of-the-art models.
arXiv Detail & Related papers (2025-06-15T19:12:37Z) - ProBA: Probabilistic Bundle Adjustment with the Bhattacharyya Coefficient [43.75661586211106]
ProBA explicitly models and propagates uncertainty in the 2D observations and the 3D scene structure.<n>Our method uses 3D Gaussians instead of point-like landmarks.<n>ProBA enhances the practicality of SLAM systems deployed in unstructured environments.
arXiv Detail & Related papers (2025-05-27T08:07:00Z) - GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention [15.890744831541452]
3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving.<n>We propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention.
arXiv Detail & Related papers (2025-05-15T20:05:08Z) - econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians [56.85804719947]
We propose econSG for open-vocabulary semantic segmentation with 3DGS.<n>Our econSG shows state-of-the-art performance on four benchmark datasets compared to the existing methods.
arXiv Detail & Related papers (2025-04-08T13:12:31Z) - GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction [55.60972844777044]
3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving.<n>Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes.<n>We propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied.
arXiv Detail & Related papers (2024-12-05T17:59:58Z) - DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes [71.61083731844282]
We present DeSiRe-GS, a self-supervised gaussian splatting representation.<n>It enables effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios.
arXiv Detail & Related papers (2024-11-18T05:49:16Z) - GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene.
We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians.
GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.