Related papers: GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention

GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention

URL: http://arxiv.org/abs/2505.10685v1
Date: Thu, 15 May 2025 20:05:08 GMT
Title: GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention
Authors: Lingjun Zhao, Sizhe Wei, James Hays, Lu Gan,
Abstract summary: 3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving.<n>We propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention.
Score: 15.890744831541452
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving. Compared to camera-only perception systems, multi-modal pipelines, especially LiDAR-camera fusion methods, can produce more accurate and detailed predictions. Although most existing works utilize a dense grid-based representation, in which the entire 3D space is uniformly divided into discrete voxels, the emergence of 3D Gaussians provides a compact and continuous object-centric representation. In this work, we propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention, named as GaussianFormer3D. We introduce a voxel-to-Gaussian initialization strategy to provide 3D Gaussians with geometry priors from LiDAR data, and design a LiDAR-guided 3D deformable attention mechanism for refining 3D Gaussians with LiDAR-camera fusion features in a lifted 3D space. We conducted extensive experiments on both on-road and off-road datasets, demonstrating that our GaussianFormer3D achieves high prediction accuracy that is comparable to state-of-the-art multi-modal fusion-based methods with reduced memory consumption and improved efficiency.

Related papers

GaussianFusionOcc: A Seamless Sensor Fusion Approach for 3D Occupancy Prediction Using 3D Gaussians [4.635245015125757]
3D semantic occupancy prediction is one of the crucial tasks of autonomous driving.<n>We propose a new approach to predict 3D semantic occupancy in complex environments.<n>We use semantic 3D Gaussians alongside an innovative sensor fusion mechanism.
arXiv Detail & Related papers (2025-07-24T15:46:38Z)
Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction [30.518107360632488]
Generalizable 3D Gaussian Splatting reconstruction showcases advanced Image-to-3D content creation.<n>method provides an efficient, scalable solution for real-world 3D content generation.
arXiv Detail & Related papers (2025-07-20T11:33:13Z)
3DGEER: Exact and Efficient Volumetric Rendering with 3D Gaussians [15.776720879897345]
We introduce 3DGEER, an Exact and Efficient Volumetric Gaussian Rendering method.<n>Our method consistently outperforms prior methods, establishing a new state-of-the-art in real-time neural rendering.
arXiv Detail & Related papers (2025-05-29T22:52:51Z)
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding [44.68350305790145]
GaussTR is a novel Transformer framework that unifies sparse 3D modeling with foundation model alignment through Gaussian representations to advance 3D spatial understanding.<n>Experiments on the Occ3D-nuScenes dataset demonstrate GaussTR's state-of-the-art zero-shot performance of 12.27 mIoU, along with a 40% reduction in training time.<n>These results highlight the efficacy of GaussTR for scalable and holistic 3D spatial understanding, with promising implications in autonomous driving and embodied agents.
arXiv Detail & Related papers (2024-12-17T18:59:46Z)
GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction [55.60972844777044]
3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving.<n>Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes.<n>We propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied.
arXiv Detail & Related papers (2024-12-05T17:59:58Z)
Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set [49.780302894956776]
It is vital to infer a signed distance function (SDF) in multi-view based surface reconstruction. We propose a method that seamlessly merge 3DGS with the learning of neural SDFs. Our numerical and visual comparisons show our superiority over the state-of-the-art results on the widely used benchmarks.
arXiv Detail & Related papers (2024-10-18T05:48:06Z)
L3DG: Latent 3D Gaussian Diffusion [74.36431175937285]
L3DG is the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation. We employ a sparse convolutional architecture to efficiently operate on room-scale scenes. By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time.
arXiv Detail & Related papers (2024-10-17T13:19:32Z)
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene. We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians. GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z)
3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting [58.95801720309658]
In this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR.<n>The key insight is incorporating an implicit signed distance field (SDF) within 3D Gaussians to enable them to be aligned and jointly optimized.<n>Our experimental results demonstrate that our 3DGSR method enables high-quality 3D surface reconstruction while preserving the efficiency and rendering quality of 3DGS.
arXiv Detail & Related papers (2024-03-30T16:35:38Z)
GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling [55.05713977022407]
We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling. We derive GaussianCube by first using a novel densification-constrained Gaussian fitting algorithm, which yields high-accuracy fitting. Experiments conducted on unconditional and class-conditioned object generation, digital avatar creation, and text-to-3D all show that our model synthesis achieves state-of-the-art generation results.
arXiv Detail & Related papers (2024-03-28T17:59:50Z)
Gaussian Splatting SLAM [16.3858380078553]
We present the first application of 3D Gaussian Splatting in monocular SLAM. Our method runs live at 3fps, unifying the required representation for accurate tracking, mapping, and high-quality rendering. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera.
arXiv Detail & Related papers (2023-12-11T18:19:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.