ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining
- URL: http://arxiv.org/abs/2408.10906v1
- Date: Tue, 20 Aug 2024 14:49:14 GMT
- Title: ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining
- Authors: Qi Ma, Yue Li, Bin Ren, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Danda Pani Paudel,
- Abstract summary: We build a large-scale dataset of 3DGS using ShapeNet and ModelNet datasets.
Our dataset ShapeSplat consists of 65K objects from 87 unique categories.
We introduce textbftextitGaussian-MAE, which highlights the unique benefits of representation learning from Gaussian parameters.
- Score: 104.34751911174196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, whose labels are in accordance with the respective datasets. The creation of this dataset utilized the compute equivalent of 2 GPU years on a TITAN XP GPU. We utilize our dataset for unsupervised pretraining and supervised finetuning for classification and segmentation tasks. To this end, we introduce \textbf{\textit{Gaussian-MAE}}, which highlights the unique benefits of representation learning from Gaussian parameters. Through exhaustive experiments, we provide several valuable insights. In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks.
Related papers
- TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views [18.050257821756148]
TSGaussian is a novel framework that combines semantic constraints with depth priors to avoid geometry degradation in novel view synthesis tasks.
Our approach prioritizes computational resources on designated targets while minimizing background allocation.
Extensive experiments demonstrate that TSGaussian outperforms state-of-the-art methods on three standard datasets.
arXiv Detail & Related papers (2024-12-13T11:26:38Z) - GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction [55.60972844777044]
3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving.
Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes.
We propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied.
arXiv Detail & Related papers (2024-12-05T17:59:58Z) - Planar Gaussian Splatting [42.74999794635269]
Planar Gaussian Splatting (PGS) is a novel neural rendering approach to learn the 3D geometry and parse the 3D planes of a scene.
The PGS achieves state-of-the-art performance in 3D planar reconstruction without requiring either 3D plane labels or depth supervision.
arXiv Detail & Related papers (2024-12-02T19:46:43Z) - HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction [46.269350101349715]
HiSplat is a novel framework for generalizable 3D Gaussian Splatting.
It generates hierarchical 3D Gaussians via a coarse-to-fine strategy.
It significantly enhances reconstruction quality and cross-dataset generalization.
arXiv Detail & Related papers (2024-10-08T17:59:32Z) - GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene.
We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians.
GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z) - CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting [68.94594215660473]
We propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS)
We exploit a small set of anchor primitives for prediction, allowing the majority of primitives to be encapsulated into highly compact residual forms.
Experimental results show that the proposed CompGS significantly outperforms existing methods, achieving superior compactness in 3D scene representation without compromising model accuracy and rendering quality.
arXiv Detail & Related papers (2024-04-15T04:50:39Z) - GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering [112.16239342037714]
GES (Generalized Exponential Splatting) is a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes.
With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks.
arXiv Detail & Related papers (2024-02-15T17:32:50Z) - Learning Segmented 3D Gaussians via Efficient Feature Unprojection for Zero-shot Neural Scene Segmentation [16.57158278095853]
Zero-shot neural scene segmentation serves as an effective way for scene understanding.
Existing models, especially the efficient 3D Gaussian-based methods, struggle to produce compact segmentation results.
Our work proposes the Feature Unprojection and Fusion module as the segmentation field.
We show that our model surpasses baselines on zero-shot semantic segmentation task, improving by 10% mIoU over the best baseline.
arXiv Detail & Related papers (2024-01-11T14:05:01Z) - GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting [51.96353586773191]
We introduce textbfGS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping system.
Our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering.
Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets.
arXiv Detail & Related papers (2023-11-20T12:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.