Related papers: RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields

RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields

URL: http://arxiv.org/abs/2405.18033v2
Date: Fri, 30 Aug 2024 16:14:57 GMT
Title: RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields
Authors: Mihnea-Bogdan Jurca, Remco Royen, Ion Giosan, Adrian Munteanu,
Abstract summary: We introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting. Our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches.
Score: 6.071025178912125
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gaussian Splatting has revolutionized the world of novel view synthesis by achieving high rendering performance in real-time. Recently, studies have focused on enriching these 3D representations with semantic information for downstream tasks. In this paper, we introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting. While existing Gaussian Splatting-based approaches rely on scene-specific training, RT-GS2 demonstrates the ability to generalize to unseen scenes. Our method adopts a new approach by first extracting view-independent 3D Gaussian features in a self-supervised manner, followed by a novel View-Dependent / View-Independent (VDVI) feature fusion to enhance semantic consistency over different views. Extensive experimentation on three different datasets showcases RT-GS2's superiority over the state-of-the-art methods in semantic segmentation quality, exemplified by a 8.01% increase in mIoU on the Replica dataset. Moreover, our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches. This work represents a significant advancement in the field by introducing, to the best of our knowledge, the first real-time generalizable semantic segmentation method for 3D Gaussian representations of radiance fields.

Related papers

ODG: Occupancy Prediction Using Dual Gaussians [38.9869091446875]
Occupancy prediction infers fine-grained 3D geometry and semantics from camera images of the surrounding environment.<n>Existing methods either adopt dense grids as scene representation, or learn the entire scene using a single set of sparse queries.<n>We present ODG, a hierarchical dual sparse Gaussian representation to effectively capture complex scene dynamics.
arXiv Detail & Related papers (2025-06-11T06:03:03Z)
G2SDF: Surface Reconstruction from Explicit Gaussians with Implicit SDFs [84.07233691641193]
We introduce G2SDF, a novel approach that integrates a neural implicit Signed Distance Field into the Gaussian Splatting framework. G2SDF achieves superior quality than prior works while maintaining the efficiency of 3DGS.
arXiv Detail & Related papers (2024-11-25T20:07:07Z)
GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting. We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z)
LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes [39.687526103092445]
We show that a simple yet effective aggregation technique yields excellent results. We extend this method to generic DINOv2 features, integrating 3D scene geometry through graph diffusion, and achieve competitive segmentation results.
arXiv Detail & Related papers (2024-10-18T13:44:29Z)
ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining [104.34751911174196]
We build a large-scale dataset of 3DGS using ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories. We introduce textbftextitGaussian-MAE, which highlights the unique benefits of representation learning from Gaussian parameters.
arXiv Detail & Related papers (2024-08-20T14:49:14Z)
MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo [54.00987996368157]
We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) MVSGaussian achieves real-time rendering with better synthesis quality for each scene.
arXiv Detail & Related papers (2024-05-20T17:59:30Z)
CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding [32.76277160013881]
We present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting. SAC exploits the inherent unified semantics within objects to learn compact yet effective semantic representations of 3D Gaussians. We also introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model.
arXiv Detail & Related papers (2024-04-22T15:01:32Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering [112.16239342037714]
GES (Generalized Exponential Splatting) is a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes. With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks.
arXiv Detail & Related papers (2024-02-15T17:32:50Z)
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis [70.24111297192057]
We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting.
arXiv Detail & Related papers (2023-12-04T18:59:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.