GaussianFusionOcc: A Seamless Sensor Fusion Approach for 3D Occupancy Prediction Using 3D Gaussians
- URL: http://arxiv.org/abs/2507.18522v1
- Date: Thu, 24 Jul 2025 15:46:38 GMT
- Title: GaussianFusionOcc: A Seamless Sensor Fusion Approach for 3D Occupancy Prediction Using 3D Gaussians
- Authors: Tomislav Pavković, Mohammad-Ali Nikouei Mahani, Johannes Niedermayer, Johannes Betz,
- Abstract summary: 3D semantic occupancy prediction is one of the crucial tasks of autonomous driving.<n>We propose a new approach to predict 3D semantic occupancy in complex environments.<n>We use semantic 3D Gaussians alongside an innovative sensor fusion mechanism.
- Score: 4.635245015125757
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 3D semantic occupancy prediction is one of the crucial tasks of autonomous driving. It enables precise and safe interpretation and navigation in complex environments. Reliable predictions rely on effective sensor fusion, as different modalities can contain complementary information. Unlike conventional methods that depend on dense grid representations, our approach, GaussianFusionOcc, uses semantic 3D Gaussians alongside an innovative sensor fusion mechanism. Seamless integration of data from camera, LiDAR, and radar sensors enables more precise and scalable occupancy prediction, while 3D Gaussian representation significantly improves memory efficiency and inference speed. GaussianFusionOcc employs modality-agnostic deformable attention to extract essential features from each sensor type, which are then used to refine Gaussian properties, resulting in a more accurate representation of the environment. Extensive testing with various sensor combinations demonstrates the versatility of our approach. By leveraging the robustness of multi-modal fusion and the efficiency of Gaussian representation, GaussianFusionOcc outperforms current state-of-the-art models.
Related papers
- SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction [8.723840755505817]
We propose a novel multimodal occupancy prediction network called SDG-OCC.<n>It incorporates a joint semantic and depth-guided view transformation and a fusion-to-occupancy-driven active distillation.<n>Our method achieves state-of-the-art (SOTA) performance with real-time processing on the Occ3D-nuScenes dataset.
arXiv Detail & Related papers (2025-07-22T23:49:40Z) - GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving [7.989953129185359]
We introduce a Gaussian-based multi-sensor fusion framework for end-to-end autonomous driving.<n>Our method employs intuitive and compact Gaussian representations as intermediate carriers to aggregate information from diverse sensors.<n>The explicit features capture rich semantic and spatial information about the traffic scene, while the implicit features provide complementary cues for trajectory planning.
arXiv Detail & Related papers (2025-05-27T01:43:02Z) - GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention [15.890744831541452]
3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving.<n>We propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention.
arXiv Detail & Related papers (2025-05-15T20:05:08Z) - OccCylindrical: Multi-Modal Fusion with Cylindrical Representation for 3D Semantic Occupancy Prediction [9.099401529072324]
We propose OccCylindrical that merges and refines the different modality features under cylindrical coordinates.<n>Our method preserves more fine-grained geometry detail that leads to better performance.<n>Experiments conducted on the nuScenes dataset, including challenging rainy and nighttime scenarios, confirm our approach's effectiveness and state-of-the-art performance.
arXiv Detail & Related papers (2025-05-06T08:12:31Z) - Distilling Multi-view Diffusion Models into 3D Generators [4.3238419212557115]
We introduce DD3G, a formulation that Distills a multi-view Diffusion model (MV-DM) into a 3D Generator using gaussian splatting.<n> DD3G compresses and integrates extensive visual and spatial knowledge from the MV-DM.<n>We propose PEPD, a generator consisting of Pattern Extraction and Progressive Decoding phases, which enables efficient fusion of probabilistic flow.
arXiv Detail & Related papers (2025-04-01T06:32:48Z) - GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction [55.60972844777044]
3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving.<n>Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes.<n>We propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied.
arXiv Detail & Related papers (2024-12-05T17:59:58Z) - L3DG: Latent 3D Gaussian Diffusion [74.36431175937285]
L3DG is the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation.
We employ a sparse convolutional architecture to efficiently operate on room-scale scenes.
By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time.
arXiv Detail & Related papers (2024-10-17T13:19:32Z) - GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene.
We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians.
GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z) - CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting [68.94594215660473]
We propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS)
We exploit a small set of anchor primitives for prediction, allowing the majority of primitives to be encapsulated into highly compact residual forms.
Experimental results show that the proposed CompGS significantly outperforms existing methods, achieving superior compactness in 3D scene representation without compromising model accuracy and rendering quality.
arXiv Detail & Related papers (2024-04-15T04:50:39Z) - GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling [55.05713977022407]
We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling.
We derive GaussianCube by first using a novel densification-constrained Gaussian fitting algorithm, which yields high-accuracy fitting.
Experiments conducted on unconditional and class-conditioned object generation, digital avatar creation, and text-to-3D all show that our model synthesis achieves state-of-the-art generation results.
arXiv Detail & Related papers (2024-03-28T17:59:50Z) - SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM [14.126704753481972]
SemGauss-SLAM is a dense semantic SLAM system that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering simultaneously.<n>We incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment.<n>By leveraging multi-frame semantic associations, this strategy enables joint optimization of 3D Gaussian representation and camera poses, resulting in low-drift tracking and accurate semantic mapping.
arXiv Detail & Related papers (2024-03-12T10:33:26Z) - GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering [112.16239342037714]
GES (Generalized Exponential Splatting) is a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes.
With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks.
arXiv Detail & Related papers (2024-02-15T17:32:50Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.