LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes
- URL: http://arxiv.org/abs/2410.14462v1
- Date: Fri, 18 Oct 2024 13:44:29 GMT
- Title: LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes
- Authors: Juliette Marrie, Romain Ménégaux, Michael Arbel, Diane Larlus, Julien Mairal,
- Abstract summary: We show that a simple yet effective aggregation technique yields excellent results.
We extend this method to generic DINOv2 features, integrating 3D scene geometry through graph diffusion, and achieve competitive segmentation results.
- Score: 39.687526103092445
- License:
- Abstract: We address the task of uplifting visual features or semantic masks from 2D vision models to 3D scenes represented by Gaussian Splatting. Whereas common approaches rely on iterative optimization-based procedures, we show that a simple yet effective aggregation technique yields excellent results. Applied to semantic masks from Segment Anything (SAM), our uplifting approach leads to segmentation quality comparable to the state of the art. We then extend this method to generic DINOv2 features, integrating 3D scene geometry through graph diffusion, and achieve competitive segmentation results despite DINOv2 not being trained on millions of annotated masks like SAM.
Related papers
- Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models [57.37244894146089]
We propose Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding tasks.
We show that it outperforms competitive baselines and achieves significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-07-18T16:20:56Z) - RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields [6.071025178912125]
We introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting.
Our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches.
arXiv Detail & Related papers (2024-05-28T10:34:28Z) - CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding [32.76277160013881]
We present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting.
SAC exploits the inherent unified semantics within objects to learn compact yet effective semantic representations of 3D Gaussians.
We also introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model.
arXiv Detail & Related papers (2024-04-22T15:01:32Z) - Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation [14.967600484476385]
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint.
Our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views.
The resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by $+8%$ over the state of the art.
arXiv Detail & Related papers (2024-04-19T10:47:53Z) - Segment Any 3D Object with Language [58.471327490684295]
We introduce Segment any 3D Object with LanguagE (SOLE), a semantic geometric and-aware visual-language learning framework with strong generalizability.
Specifically, we propose a multimodal fusion network to incorporate multimodal semantics in both backbone and decoder.
Our SOLE outperforms previous methods by a large margin on ScanNetv2, ScanNet200, and Replica benchmarks.
arXiv Detail & Related papers (2024-04-02T17:59:10Z) - Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting.
Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians.
We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z) - Segment Any 3D Gaussians [85.93694310363325]
This paper presents SAGA, a highly efficient 3D promptable segmentation method based on 3D Gaussian Splatting (3D-GS)
Given 2D visual prompts as input, SAGA can segment the corresponding 3D target represented by 3D Gaussians within 4 ms.
We show that SAGA achieves real-time multi-granularity segmentation with quality comparable to state-of-the-art methods.
arXiv Detail & Related papers (2023-12-01T17:15:24Z) - Scene-Generalizable Interactive Segmentation of Radiance Fields [64.37093918762]
We make the first attempt at Scene-Generalizable Interactive in Radiance Fields (SGISRF)
We propose a novel SGISRF method, which can perform 3D object segmentation for novel (unseen) scenes represented by radiance fields, guided by only a few interactive user clicks in a given set of multi-view 2D images.
Experiments on two real-world challenging benchmarks covering diverse scenes demonstrate 1) effectiveness and scene-generalizability of the proposed method, 2) favorable performance compared to classical method requiring scene-specific optimization.
arXiv Detail & Related papers (2023-08-09T17:55:50Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.