GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention
- URL: http://arxiv.org/abs/2505.10685v1
- Date: Thu, 15 May 2025 20:05:08 GMT
- Title: GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention
- Authors: Lingjun Zhao, Sizhe Wei, James Hays, Lu Gan,
- Abstract summary: 3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving.<n>We propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention.
- Score: 15.890744831541452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving. Compared to camera-only perception systems, multi-modal pipelines, especially LiDAR-camera fusion methods, can produce more accurate and detailed predictions. Although most existing works utilize a dense grid-based representation, in which the entire 3D space is uniformly divided into discrete voxels, the emergence of 3D Gaussians provides a compact and continuous object-centric representation. In this work, we propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention, named as GaussianFormer3D. We introduce a voxel-to-Gaussian initialization strategy to provide 3D Gaussians with geometry priors from LiDAR data, and design a LiDAR-guided 3D deformable attention mechanism for refining 3D Gaussians with LiDAR-camera fusion features in a lifted 3D space. We conducted extensive experiments on both on-road and off-road datasets, demonstrating that our GaussianFormer3D achieves high prediction accuracy that is comparable to state-of-the-art multi-modal fusion-based methods with reduced memory consumption and improved efficiency.
Related papers
- GaussianFusionOcc: A Seamless Sensor Fusion Approach for 3D Occupancy Prediction Using 3D Gaussians [4.635245015125757]
3D semantic occupancy prediction is one of the crucial tasks of autonomous driving.<n>We propose a new approach to predict 3D semantic occupancy in complex environments.<n>We use semantic 3D Gaussians alongside an innovative sensor fusion mechanism.
arXiv Detail & Related papers (2025-07-24T15:46:38Z) - Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction [30.518107360632488]
Generalizable 3D Gaussian Splatting reconstruction showcases advanced Image-to-3D content creation.<n>method provides an efficient, scalable solution for real-world 3D content generation.
arXiv Detail & Related papers (2025-07-20T11:33:13Z) - 3DGEER: Exact and Efficient Volumetric Rendering with 3D Gaussians [15.776720879897345]
We introduce 3DGEER, an Exact and Efficient Volumetric Gaussian Rendering method.<n>Our method consistently outperforms prior methods, establishing a new state-of-the-art in real-time neural rendering.
arXiv Detail & Related papers (2025-05-29T22:52:51Z) - GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding [44.68350305790145]
GaussTR is a novel Transformer framework that unifies sparse 3D modeling with foundation model alignment through Gaussian representations to advance 3D spatial understanding.<n>Experiments on the Occ3D-nuScenes dataset demonstrate GaussTR's state-of-the-art zero-shot performance of 12.27 mIoU, along with a 40% reduction in training time.<n>These results highlight the efficacy of GaussTR for scalable and holistic 3D spatial understanding, with promising implications in autonomous driving and embodied agents.
arXiv Detail & Related papers (2024-12-17T18:59:46Z) - GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction [55.60972844777044]
3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving.<n>Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes.<n>We propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied.
arXiv Detail & Related papers (2024-12-05T17:59:58Z) - Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set [49.780302894956776]
It is vital to infer a signed distance function (SDF) in multi-view based surface reconstruction.
We propose a method that seamlessly merge 3DGS with the learning of neural SDFs.
Our numerical and visual comparisons show our superiority over the state-of-the-art results on the widely used benchmarks.
arXiv Detail & Related papers (2024-10-18T05:48:06Z) - L3DG: Latent 3D Gaussian Diffusion [74.36431175937285]
L3DG is the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation.
We employ a sparse convolutional architecture to efficiently operate on room-scale scenes.
By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time.
arXiv Detail & Related papers (2024-10-17T13:19:32Z) - GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene.
We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians.
GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z) - 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting [58.95801720309658]
In this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR.<n>The key insight is incorporating an implicit signed distance field (SDF) within 3D Gaussians to enable them to be aligned and jointly optimized.<n>Our experimental results demonstrate that our 3DGSR method enables high-quality 3D surface reconstruction while preserving the efficiency and rendering quality of 3DGS.
arXiv Detail & Related papers (2024-03-30T16:35:38Z) - GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling [55.05713977022407]
We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling.
We derive GaussianCube by first using a novel densification-constrained Gaussian fitting algorithm, which yields high-accuracy fitting.
Experiments conducted on unconditional and class-conditioned object generation, digital avatar creation, and text-to-3D all show that our model synthesis achieves state-of-the-art generation results.
arXiv Detail & Related papers (2024-03-28T17:59:50Z) - Gaussian Splatting SLAM [16.3858380078553]
We present the first application of 3D Gaussian Splatting in monocular SLAM.
Our method runs live at 3fps, unifying the required representation for accurate tracking, mapping, and high-quality rendering.
Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera.
arXiv Detail & Related papers (2023-12-11T18:19:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.