GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction
- URL: http://arxiv.org/abs/2412.04384v2
- Date: Fri, 06 Dec 2024 15:43:40 GMT
- Title: GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction
- Authors: Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng, Yunpeng Zhang, Dalong Du, Jiwen Lu,
- Abstract summary: 3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving.
Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes.
We propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied.
- Score: 55.60972844777044
- License:
- Abstract: 3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving, which predicts fine-grained geometry and semantics of the surrounding scene. Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes. Although 3D semantic Gaussian serves as an object-centric sparse alternative, most of the Gaussians still describe the empty region with low efficiency. To address this, we propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied and conforms to probabilistic multiplication to derive the overall geometry. Furthermore, we adopt the exact Gaussian mixture model for semantics calculation to avoid unnecessary overlapping of Gaussians. To effectively initialize Gaussians in non-empty region, we design a distribution-based initialization module which learns the pixel-aligned occupancy distribution instead of the depth of surfaces. We conduct extensive experiments on nuScenes and KITTI-360 datasets and our GaussianFormer-2 achieves state-of-the-art performance with high efficiency. Code: https://github.com/huang-yh/GaussianFormer.
Related papers
- GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding [44.68350305790145]
We introduce GaussTR, a novel Gaussian Transformer to advance self-supervised 3D spatial understanding.
GaussTR adopts a Transformer architecture to predict sparse sets of 3D Gaussians that represent scenes in a feed-forward manner.
Empirical evaluations on the Occ3D-nuScenes dataset showcase GaussTR's state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-12-17T18:59:46Z) - GaussianAD: Gaussian-Centric End-to-End Autonomous Driving [23.71316979650116]
Vision-based autonomous driving shows great potential due to its satisfactory performance and low costs.
Most existing methods adopt dense representations (e.g., bird's eye view) or sparse representations (e.g., instance boxes) for decision-making.
This paper explores a Gaussian-centric end-to-end autonomous driving framework and exploits 3D semantic Gaussians to extensively yet sparsely describe the scene.
arXiv Detail & Related papers (2024-12-13T18:59:30Z) - PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views [116.10577967146762]
PixelGaussian is an efficient framework for learning generalizable 3D Gaussian reconstruction from arbitrary views.
Our method achieves state-of-the-art performance with good generalization to various numbers of views.
arXiv Detail & Related papers (2024-10-24T17:59:58Z) - GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting [16.480043962212566]
We introduce a systematic method that investigates the two usages of Gaussian splatting for fully self-supervised and efficient 3D occupancy estimation in surround views.
As a result, the proposed GaussianOcc method enables fully self-supervised (no ground truth pose) 3D occupancy estimation in competitive performance with low computational cost.
arXiv Detail & Related papers (2024-08-21T09:06:30Z) - ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining [104.34751911174196]
We build a large-scale dataset of 3DGS using ShapeNet and ModelNet datasets.
Our dataset ShapeSplat consists of 65K objects from 87 unique categories.
We introduce textbftextitGaussian-MAE, which highlights the unique benefits of representation learning from Gaussian parameters.
arXiv Detail & Related papers (2024-08-20T14:49:14Z) - Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos [58.22272760132996]
We show that existing 4D Gaussian methods dramatically fail in this setup because the monocular setting is underconstrained.
We propose Dynamic Gaussian Marbles, which consist of three core modifications that target the difficulties of the monocular setting.
We evaluate on the Nvidia Dynamic Scenes dataset and the DyCheck iPhone dataset, and show that Gaussian Marbles significantly outperforms other Gaussian baselines in quality.
arXiv Detail & Related papers (2024-06-26T19:37:07Z) - GaussianForest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling [40.743135560583816]
We introduce the Gaussian-Forest modeling framework, which hierarchically represents a scene as a forest of hybrid 3D Gaussians.
Experiments demonstrate that Gaussian-Forest not only maintains comparable speed and quality but also achieves a compression rate surpassing 10 times.
arXiv Detail & Related papers (2024-06-13T02:41:11Z) - GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene.
We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians.
GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z) - GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling [55.05713977022407]
We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling.
We derive GaussianCube by first using a novel densification-constrained Gaussian fitting algorithm, which yields high-accuracy fitting.
Experiments conducted on unconditional and class-conditioned object generation, digital avatar creation, and text-to-3D all show that our model synthesis achieves state-of-the-art generation results.
arXiv Detail & Related papers (2024-03-28T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.