Related papers: LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes

Related papers

econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians [56.85804719947]
We propose econSG for open-vocabulary semantic segmentation with 3DGS. Our econSG shows state-of-the-art performance on four benchmark datasets compared to the existing methods.
arXiv Detail & Related papers (2025-04-08T13:12:31Z)
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images. Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z)
Gradient-Weighted Feature Back-Projection: A Fast Alternative to Feature Distillation in 3D Gaussian Splatting [6.647959476396794]
Our approach back-projects 2D features into pre-trained 3D Gaussians, using a weighted sum based on each Gaussian's influence in the final rendering. While most training-based feature field rendering methods excel at 2D segmentation but perform poorly at 3D segmentation without post-processing, our method achieves high-quality results in both 2D and 3D segmentation.
arXiv Detail & Related papers (2024-11-19T12:17:15Z)
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models [57.37244894146089]
We propose Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding tasks. We show that it outperforms competitive baselines and achieves significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-07-18T16:20:56Z)
RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields [6.071025178912125]
We introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting. Our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches.
arXiv Detail & Related papers (2024-05-28T10:34:28Z)
CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding [32.76277160013881]
We present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting. SAC exploits the inherent unified semantics within objects to learn compact yet effective semantic representations of 3D Gaussians. We also introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model.
arXiv Detail & Related papers (2024-04-22T15:01:32Z)
Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation [14.967600484476385]
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint. Our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. The resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by $+8%$ over the state of the art.
arXiv Detail & Related papers (2024-04-19T10:47:53Z)
Segment Any 3D Object with Language [58.471327490684295]
We introduce Segment any 3D Object with LanguagE (SOLE), a semantic geometric and-aware visual-language learning framework with strong generalizability. Specifically, we propose a multimodal fusion network to incorporate multimodal semantics in both backbone and decoder. Our SOLE outperforms previous methods by a large margin on ScanNetv2, ScanNet200, and Replica benchmarks.
arXiv Detail & Related papers (2024-04-02T17:59:10Z)
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians. We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z)
Segment Any 3D Gaussians [85.93694310363325]
This paper presents SAGA, a highly efficient 3D promptable segmentation method based on 3D Gaussian Splatting (3D-GS) Given 2D visual prompts as input, SAGA can segment the corresponding 3D target represented by 3D Gaussians within 4 ms. We show that SAGA achieves real-time multi-granularity segmentation with quality comparable to state-of-the-art methods.
arXiv Detail & Related papers (2023-12-01T17:15:24Z)
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting [51.96353586773191]
We introduce textbfGS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping system. Our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets.
arXiv Detail & Related papers (2023-11-20T12:08:23Z)
M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding [5.989397492717352]
We present M$3$3D ($underlineM$ulti-$underlineM$odal $underlineM$asked $underline3D$) built based on Multi-modal masked autoencoders. We integrate two major self-supervised learning frameworks; Masked Image Modeling (MIM) and contrastive learning. Experiments show that M$3$3D outperforms the existing state-of-the-art approaches on ScanNet, NYUv2, UCF-101 and OR-AR.
arXiv Detail & Related papers (2023-09-26T23:52:09Z)
Scene-Generalizable Interactive Segmentation of Radiance Fields [64.37093918762]
We make the first attempt at Scene-Generalizable Interactive in Radiance Fields (SGISRF) We propose a novel SGISRF method, which can perform 3D object segmentation for novel (unseen) scenes represented by radiance fields, guided by only a few interactive user clicks in a given set of multi-view 2D images. Experiments on two real-world challenging benchmarks covering diverse scenes demonstrate 1) effectiveness and scene-generalizability of the proposed method, 2) favorable performance compared to classical method requiring scene-specific optimization.
arXiv Detail & Related papers (2023-08-09T17:55:50Z)
Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision. This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z)
MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks. We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework. As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.