Gaga: Group Any Gaussians via 3D-aware Memory Bank
- URL: http://arxiv.org/abs/2404.07977v1
- Date: Thu, 11 Apr 2024 17:57:19 GMT
- Title: Gaga: Group Any Gaussians via 3D-aware Memory Bank
- Authors: Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang,
- Abstract summary: Gaga reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models.
By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses.
Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications.
- Score: 66.54280093684427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Contrasted to prior 3D scene segmentation approaches that heavily rely on video object tracking, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses. By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses, particularly beneficial for sparsely sampled images, ensuring precise mask label consistency. Furthermore, Gaga accommodates 2D segmentation masks from diverse sources and demonstrates robust performance with different open-world zero-shot segmentation models, enhancing its versatility. Extensive qualitative and quantitative evaluations demonstrate that Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications such as scene understanding and manipulation.
Related papers
- Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning [63.94919846010485]
3D Gaussian inpainting (3DGI) is challenging in effectively leveraging complementary visual and semantic cues from multiple input views.
We propose a method that measures the visibility uncertainties of 3D points across different input views and uses them to guide 3DGI.
We build a novel 3DGI framework, VISTA, by integrating VISibility-uncerTainty-guided 3DGI with scene conceptuAl learning.
arXiv Detail & Related papers (2025-04-23T06:21:11Z) - NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation [14.046423852723615]
We introduce a novel 3D Gaussian Splatting based hard visual prompting approach to generate diverse viewpoints around target objects.
Our method simulates realistic 3D perspectives, effectively augmenting existing hard visual prompts.
This training-free strategy integrates seamlessly with prior hard visual prompts, enriching object-descriptive features.
arXiv Detail & Related papers (2025-04-20T14:39:27Z) - MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation [87.30919771444117]
Reasoning segmentation aims to segment target objects in complex scenes based on human intent and spatial reasoning.
Recent multimodal large language models (MLLMs) have demonstrated impressive 2D image reasoning segmentation.
We introduce MLLM-For3D, a framework that transfers knowledge from 2D MLLMs to 3D scene understanding.
arXiv Detail & Related papers (2025-03-23T16:40:20Z) - WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images [16.107027445270887]
We introduce WildSeg3D, an efficient approach that enables the segmentation of arbitrary 3D objects across diverse environments.
A key challenge of this feed-forward approach lies in the accumulation of 3D alignment errors across multiple 2D views.
We also propose Dynamic Global Aligning (DGA) and Multi-view Group Mapping (MGM) for real-time interactive segmentation.
arXiv Detail & Related papers (2025-03-11T13:10:41Z) - NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images.
We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z) - LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes [39.687526103092445]
We show that a simple yet effective aggregation technique yields excellent results.
We extend this method to generic DINOv2 features, integrating 3D scene geometry through graph diffusion, and achieve competitive segmentation results.
arXiv Detail & Related papers (2024-10-18T13:44:29Z) - OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation [15.833273340802311]
It is difficult to precisely reconstruct specific objects from large scenes.
Current scene reconstruction techniques frequently result in the loss of object detail textures.
We propose a framework termed OMEGAS: Object Extraction from Large Scenes Guided by Gaussian.
We demonstrate that our method can accurately reconstruct specific targets from large scenes, both quantitatively and qualitatively.
arXiv Detail & Related papers (2024-04-24T14:29:26Z) - One Noise to Rule Them All: Multi-View Adversarial Attacks with Universal Perturbation [1.4680035572775534]
This paper presents a novel universal perturbation method for generating robust multi-view adversarial examples in 3D object recognition.
Unlike conventional attacks limited to single views, our approach operates on multiple 2D images, offering a practical and scalable solution.
arXiv Detail & Related papers (2024-04-02T20:29:59Z) - Zero-Shot Multi-Object Scene Completion [59.325611678171974]
We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image.
Our method outperforms the current state-of-the-art on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-21T17:59:59Z) - Weakly Supervised Monocular 3D Detection with a Single-View Image [58.57978772009438]
Monocular 3D detection aims for precise 3D object localization from a single-view image.
We propose SKD-WM3D, a weakly supervised monocular 3D detection framework.
We show that SKD-WM3D surpasses the state-of-the-art clearly and is even on par with many fully supervised methods.
arXiv Detail & Related papers (2024-02-29T13:26:47Z) - Gaussian Grouping: Segment and Edit Anything in 3D Scenes [65.49196142146292]
We propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes.
Compared to the implicit NeRF representation, we show that the grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency.
arXiv Detail & Related papers (2023-12-01T17:09:31Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with
Informative-Preserved Reconstruction and Self-Distilled Consistency [120.9499803967496]
We propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points.
Our method can concentrate on modeling regional geometry and enjoy less ambiguity for masked reconstruction.
By combining informative-preserved reconstruction on masked areas and consistency self-distillation from unmasked areas, a unified framework called MM-3DScene is yielded.
arXiv Detail & Related papers (2022-12-20T01:53:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.