Gaga: Group Any Gaussians via 3D-aware Memory Bank
- URL: http://arxiv.org/abs/2404.07977v1
- Date: Thu, 11 Apr 2024 17:57:19 GMT
- Title: Gaga: Group Any Gaussians via 3D-aware Memory Bank
- Authors: Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang,
- Abstract summary: Gaga reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models.
By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses.
Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications.
- Score: 66.54280093684427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Contrasted to prior 3D scene segmentation approaches that heavily rely on video object tracking, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses. By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses, particularly beneficial for sparsely sampled images, ensuring precise mask label consistency. Furthermore, Gaga accommodates 2D segmentation masks from diverse sources and demonstrates robust performance with different open-world zero-shot segmentation models, enhancing its versatility. Extensive qualitative and quantitative evaluations demonstrate that Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications such as scene understanding and manipulation.
Related papers
- Multi-View Pose-Agnostic Change Localization with Zero Labels [4.997375878454274]
We propose a label-free, pose-agnostic change detection method that integrates information from multiple viewpoints.
With as few as 5 images of the post-change scene, our approach can learn additional change channels in a 3DGS.
Our change-aware 3D scene representation additionally enables the generation of accurate change masks for unseen viewpoints.
arXiv Detail & Related papers (2024-12-05T06:28:54Z) - NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images.
We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z) - OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation [15.833273340802311]
It is difficult to precisely reconstruct specific objects from large scenes.
Current scene reconstruction techniques frequently result in the loss of object detail textures.
We propose a framework termed OMEGAS: Object Extraction from Large Scenes Guided by Gaussian.
We demonstrate that our method can accurately reconstruct specific targets from large scenes, both quantitatively and qualitatively.
arXiv Detail & Related papers (2024-04-24T14:29:26Z) - One Noise to Rule Them All: Multi-View Adversarial Attacks with Universal Perturbation [1.4680035572775534]
This paper presents a novel universal perturbation method for generating robust multi-view adversarial examples in 3D object recognition.
Unlike conventional attacks limited to single views, our approach operates on multiple 2D images, offering a practical and scalable solution.
arXiv Detail & Related papers (2024-04-02T20:29:59Z) - Zero-Shot Multi-Object Scene Completion [59.325611678171974]
We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image.
Our method outperforms the current state-of-the-art on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-21T17:59:59Z) - Gaussian Grouping: Segment and Edit Anything in 3D Scenes [65.49196142146292]
We propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes.
Compared to the implicit NeRF representation, we show that the grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency.
arXiv Detail & Related papers (2023-12-01T17:09:31Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with
Informative-Preserved Reconstruction and Self-Distilled Consistency [120.9499803967496]
We propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points.
Our method can concentrate on modeling regional geometry and enjoy less ambiguity for masked reconstruction.
By combining informative-preserved reconstruction on masked areas and consistency self-distillation from unmasked areas, a unified framework called MM-3DScene is yielded.
arXiv Detail & Related papers (2022-12-20T01:53:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.