Related papers: ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining

ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining

URL: http://arxiv.org/abs/2408.10906v1
Date: Tue, 20 Aug 2024 14:49:14 GMT
Title: ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining
Authors: Qi Ma, Yue Li, Bin Ren, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Danda Pani Paudel,
Abstract summary: We build a large-scale dataset of 3DGS using ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories. We introduce textbftextitGaussian-MAE, which highlights the unique benefits of representation learning from Gaussian parameters.
Score: 104.34751911174196
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, whose labels are in accordance with the respective datasets. The creation of this dataset utilized the compute equivalent of 2 GPU years on a TITAN XP GPU. We utilize our dataset for unsupervised pretraining and supervised finetuning for classification and segmentation tasks. To this end, we introduce \textbf{\textit{Gaussian-MAE}}, which highlights the unique benefits of representation learning from Gaussian parameters. Through exhaustive experiments, we provide several valuable insights. In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks.

Related papers

GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting [16.179607149692398]
We present GaussianCross, a novel cross-modal self-supervised 3D representation learning architecture.<n>GaussianCross seamlessly converts scale-inconsistent 3D point clouds into a unified cuboid-normalized Gaussian representation.<n>It achieves superior performance through linear probing (0.1% parameters) and limited data training (1% of scenes) compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-08-04T08:12:44Z)
ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes [81.48624894781257]
3D Gaussian Splatting (3DGS) has made significant strides in novel view synthesis but is limited by the substantial number of Gaussian primitives required. Recent methods address this issue by compressing the storage size of densified Gaussians, yet fail to preserve rendering quality and efficiency. We propose ProtoGS to learn Gaussian prototypes to represent Gaussian primitives, significantly reducing the total Gaussian amount without sacrificing visual quality.
arXiv Detail & Related papers (2025-03-21T18:55:14Z)
Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting [86.15347226865826]
We design a new end-to-end object-aware lifting approach, named Unified-Lift. We augment each Gaussian point with an additional Gaussian-level feature learned using a contrastive loss to encode instance information. We conduct experiments on three benchmarks: LERF-Masked, Replica, and Messy Rooms.
arXiv Detail & Related papers (2025-03-18T08:42:23Z)
TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views [18.050257821756148]
TSGaussian is a novel framework that combines semantic constraints with depth priors to avoid geometry degradation in novel view synthesis tasks. Our approach prioritizes computational resources on designated targets while minimizing background allocation. Extensive experiments demonstrate that TSGaussian outperforms state-of-the-art methods on three standard datasets.
arXiv Detail & Related papers (2024-12-13T11:26:38Z)
GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction [55.60972844777044]
3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving. Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes. We propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied.
arXiv Detail & Related papers (2024-12-05T17:59:58Z)
Planar Gaussian Splatting [42.74999794635269]
Planar Gaussian Splatting (PGS) is a novel neural rendering approach to learn the 3D geometry and parse the 3D planes of a scene. The PGS achieves state-of-the-art performance in 3D planar reconstruction without requiring either 3D plane labels or depth supervision.
arXiv Detail & Related papers (2024-12-02T19:46:43Z)
HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction [46.269350101349715]
HiSplat is a novel framework for generalizable 3D Gaussian Splatting. It generates hierarchical 3D Gaussians via a coarse-to-fine strategy. It significantly enhances reconstruction quality and cross-dataset generalization.
arXiv Detail & Related papers (2024-10-08T17:59:32Z)
Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision. densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive. Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z)
Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting [33.01987451251659]
3D Gaussian Splatting (3DGS) has emerged as a promising technique capable of real-time rendering with high-quality 3D reconstruction. Despite its potential, 3DGS encounters challenges, including needle-like artifacts, suboptimal geometries, and inaccurate normals. We introduce effective rank as a regularization, which constrains the structure of the Gaussians.
arXiv Detail & Related papers (2024-06-17T15:51:59Z)
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene. We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians. GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z)
SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain [43.80789481557894]
We propose a novel method, named SA-GS, for fine-grained 3D geometry reconstruction using semantic-aware 3D Gaussian Splats. We leverage prior information stored in large vision models such as SAM and DINO to generate semantic masks. We extract the point cloud using a novel probability density-based extraction method, transforming Gaussian Splats into a point cloud crucial for downstream tasks.
arXiv Detail & Related papers (2024-05-27T08:15:10Z)
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting [68.94594215660473]
We propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS) We exploit a small set of anchor primitives for prediction, allowing the majority of primitives to be encapsulated into highly compact residual forms. Experimental results show that the proposed CompGS significantly outperforms existing methods, achieving superior compactness in 3D scene representation without compromising model accuracy and rendering quality.
arXiv Detail & Related papers (2024-04-15T04:50:39Z)
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering [112.16239342037714]
GES (Generalized Exponential Splatting) is a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes. With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks.
arXiv Detail & Related papers (2024-02-15T17:32:50Z)
Learning Segmented 3D Gaussians via Efficient Feature Unprojection for Zero-shot Neural Scene Segmentation [16.57158278095853]
Zero-shot neural scene segmentation serves as an effective way for scene understanding. Existing models, especially the efficient 3D Gaussian-based methods, struggle to produce compact segmentation results. Our work proposes the Feature Unprojection and Fusion module as the segmentation field. We show that our model surpasses baselines on zero-shot semantic segmentation task, improving by 10% mIoU over the best baseline.
arXiv Detail & Related papers (2024-01-11T14:05:01Z)
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting [51.96353586773191]
We introduce textbfGS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping system. Our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets.
arXiv Detail & Related papers (2023-11-20T12:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.