RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields
- URL: http://arxiv.org/abs/2405.18033v1
- Date: Tue, 28 May 2024 10:34:28 GMT
- Title: RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields
- Authors: Mihnea-Bogdan Jurca, Remco Royen, Ion Giosan, Adrian Munteanu,
- Abstract summary: We introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting.
Our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches.
- Score: 6.071025178912125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gaussian Splatting has revolutionized the world of novel view synthesis by achieving high rendering performance in real-time. Recently, studies have focused on enriching these 3D representations with semantic information for downstream tasks. In this paper, we introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting. While existing Gaussian Splatting-based approaches rely on scene-specific training, RT-GS2 demonstrates the ability to generalize to unseen scenes. Our method adopts a new approach by first extracting view-independent 3D Gaussian features in a self-supervised manner, followed by a novel View-Dependent / View-Independent (VDVI) feature fusion to enhance semantic consistency over different views. Extensive experimentation on three different datasets showcases RT-GS2's superiority over the state-of-the-art methods in semantic segmentation quality, exemplified by a 8.01% increase in mIoU on the Replica dataset. Moreover, our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches. This work represents a significant advancement in the field by introducing, to the best of our knowledge, the first real-time generalizable semantic segmentation method for 3D Gaussian representations of radiance fields.
Related papers
- MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo [54.00987996368157]
We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS)
MVSGaussian achieves real-time rendering with better synthesis quality for each scene.
arXiv Detail & Related papers (2024-05-20T17:59:30Z) - CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding [32.76277160013881]
We present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting.
SAC exploits the inherent unified semantics within objects to learn compact yet effective semantic representations of 3D Gaussians.
We also introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model.
arXiv Detail & Related papers (2024-04-22T15:01:32Z) - GaussianPro: 3D Gaussian Splatting with Progressive Propagation [49.918797726059545]
3DGS relies heavily on the point cloud produced by Structure-from-Motion (SfM) techniques.
We propose a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians.
Our method significantly surpasses 3DGS on the dataset, exhibiting an improvement of 1.15dB in terms of PSNR.
arXiv Detail & Related papers (2024-02-22T16:00:20Z) - S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph
Generation in OR [52.964721233679406]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on the multi-stage learning that generates semantic scene graphs dependent on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bimodal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering [112.16239342037714]
GES (Generalized Exponential Splatting) is a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes.
With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks.
arXiv Detail & Related papers (2024-02-15T17:32:50Z) - GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis [70.24111297192057]
We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner.
The proposed method enables 2K-resolution rendering under a sparse-view camera setting.
arXiv Detail & Related papers (2023-12-04T18:59:55Z) - GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting [51.96353586773191]
We introduce textbfGS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping system.
Our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering.
Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets.
arXiv Detail & Related papers (2023-11-20T12:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.