Related papers: GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction

GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction

URL: http://arxiv.org/abs/2601.22729v1
Date: Fri, 30 Jan 2026 09:05:30 GMT
Title: GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction
Authors: A. Enes Doruk, Hasan F. Ates,
Abstract summary: We present a memory-efficient, continuous 3D Gaussian representation framework for semantic occupancy prediction.<n>GaussianOcc3D exhibits superior robustness across challenging rainy and nighttime conditions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D semantic occupancy prediction is a pivotal task in autonomous driving, providing a dense and fine-grained understanding of the surrounding environment, yet single-modality methods face trade-offs between camera semantics and LiDAR geometry. Existing multi-modal frameworks often struggle with modality heterogeneity, spatial misalignment, and the representation crisis--where voxels are computationally heavy and BEV alternatives are lossy. We present GaussianOcc3D, a multi-modal framework bridging camera and LiDAR through a memory-efficient, continuous 3D Gaussian representation. We introduce four modules: (1) LiDAR Depth Feature Aggregation (LDFA), using depth-wise deformable sampling to lift sparse signals onto Gaussian primitives; (2) Entropy-Based Feature Smoothing (EBFS) to mitigate domain noise; (3) Adaptive Camera-LiDAR Fusion (ACLF) with uncertainty-aware reweighting for sensor reliability; and (4) a Gauss-Mamba Head leveraging Selective State Space Models for global context with linear complexity. Evaluations on Occ3D, SurroundOcc, and SemanticKITTI benchmarks demonstrate state-of-the-art performance, achieving mIoU scores of 49.4%, 28.9%, and 25.2% respectively. GaussianOcc3D exhibits superior robustness across challenging rainy and nighttime conditions.

Related papers

Gaussian Based Adaptive Multi-Modal 3D Semantic Occupancy Prediction [0.0]
This research work enhances a novel adaptive camera-LiDAR multimodal 3D occupancy prediction model.<n>It seamlessly bridges the semantic strengths of camera modality with the geometric strengths of LiDAR modality.
arXiv Detail & Related papers (2026-01-20T20:11:09Z)
ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding [7.610505486431266]
We introduce ShelfGaussian, an open-vocabulary multi-modal Gaussian-based 3D scene understanding framework supervised by off-the-shelf vision foundation models.<n>Existing methods either model objects as closed-set semantic Gaussians supervised by annotated 3D labels, neglecting their rendering ability, or learn open-set Gaussian representations via purely 2D self-supervision.
arXiv Detail & Related papers (2025-12-03T02:06:09Z)
GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering [50.675710727721786]
We propose GauSSmart, a hybrid method that bridges 2D foundational models and 3D Gaussian Splatting reconstruction.<n>Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision.<n>We validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting.
arXiv Detail & Related papers (2025-10-16T03:38:26Z)
D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction [73.61056394880733]
3D Gaussian Splatting (3DGS) enables real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations.<n>We identify two key failure modes under sparse-view conditions: overfitting in regions with excessive Gaussian density near the camera, and underfitting in distant areas with insufficient Gaussian coverage.<n>We propose a unified framework D$2$GS, comprising two key components: a Depth-and-Density Guided Dropout strategy, and a Distance-Aware Fidelity Enhancement module.
arXiv Detail & Related papers (2025-10-09T17:59:49Z)
Metropolis-Hastings Sampling for 3D Gaussian Reconstruction [31.840492077537018]
We propose an adaptive sampling framework for 3D Gaussian Splatting (3DGS)<n>Our framework overcomes limitations by reformulating densification and pruning as a probabilistic sampling process.<n>Our approach achieves faster convergence while matching or modestly surpassing the view-synthesis quality of state-of-the-art models.
arXiv Detail & Related papers (2025-06-15T19:12:37Z)
ProBA: Probabilistic Bundle Adjustment with the Bhattacharyya Coefficient [43.75661586211106]
ProBA explicitly models and propagates uncertainty in the 2D observations and the 3D scene structure.<n>Our method uses 3D Gaussians instead of point-like landmarks.<n>ProBA enhances the practicality of SLAM systems deployed in unstructured environments.
arXiv Detail & Related papers (2025-05-27T08:07:00Z)
GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention [15.890744831541452]
3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving.<n>We propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention.
arXiv Detail & Related papers (2025-05-15T20:05:08Z)
econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians [56.85804719947]
We propose econSG for open-vocabulary semantic segmentation with 3DGS.<n>Our econSG shows state-of-the-art performance on four benchmark datasets compared to the existing methods.
arXiv Detail & Related papers (2025-04-08T13:12:31Z)
GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction [55.60972844777044]
3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving.<n>Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes.<n>We propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied.
arXiv Detail & Related papers (2024-12-05T17:59:58Z)
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes [71.61083731844282]
We present DeSiRe-GS, a self-supervised gaussian splatting representation.<n>It enables effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios.
arXiv Detail & Related papers (2024-11-18T05:49:16Z)
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene. We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians. GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.