GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting
- URL: http://arxiv.org/abs/2411.07555v1
- Date: Tue, 12 Nov 2024 05:09:42 GMT
- Title: GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting
- Authors: Umangi Jain, Ashkan Mirzaei, Igor Gilitschenski,
- Abstract summary: We introduce GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians.
Our approach allows for selecting the objects to be segmented by interacting with a single view.
It accepts intuitive user input, such as point clicks, coarse scribbles, or text.
- Score: 7.392798832833857
- License:
- Abstract: We introduce GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians. Our approach allows for selecting the objects to be segmented by interacting with a single view. It accepts intuitive user input, such as point clicks, coarse scribbles, or text. Using 3D Gaussian Splatting (3DGS) as the underlying scene representation simplifies the extraction of objects of interest which are considered to be a subset of the scene's Gaussians. Our key idea is to represent the scene as a graph and use the graph-cut algorithm to minimize an energy function to effectively partition the Gaussians into foreground and background. To achieve this, we construct a graph based on scene Gaussians and devise a segmentation-aligned energy function on the graph to combine user inputs with scene properties. To obtain an initial coarse segmentation, we leverage 2D image/video segmentation models and further refine these coarse estimates using our graph construction. Our empirical evaluations show the adaptability of GaussianCut across a diverse set of scenes. GaussianCut achieves competitive performance with state-of-the-art approaches for 3D segmentation without requiring any additional segmentation-aware training.
Related papers
- ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining [104.34751911174196]
We build a large-scale dataset of 3DGS using ShapeNet and ModelNet datasets.
Our dataset ShapeSplat consists of 65K objects from 87 unique categories.
We introduce textbftextitGaussian-MAE, which highlights the unique benefits of representation learning from Gaussian parameters.
arXiv Detail & Related papers (2024-08-20T14:49:14Z) - Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos [58.22272760132996]
We show that existing 4D Gaussian methods dramatically fail in this setup because the monocular setting is underconstrained.
We propose Dynamic Gaussian Marbles, which consist of three core modifications that target the difficulties of the monocular setting.
We evaluate on the Nvidia Dynamic Scenes dataset and the DyCheck iPhone dataset, and show that Gaussian Marbles significantly outperforms other Gaussian baselines in quality.
arXiv Detail & Related papers (2024-06-26T19:37:07Z) - GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene.
We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians.
GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z) - Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation [14.967600484476385]
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint.
Our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views.
The resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by $+8%$ over the state of the art.
arXiv Detail & Related papers (2024-04-19T10:47:53Z) - Learning Segmented 3D Gaussians via Efficient Feature Unprojection for Zero-shot Neural Scene Segmentation [16.57158278095853]
Zero-shot neural scene segmentation serves as an effective way for scene understanding.
Existing models, especially the efficient 3D Gaussian-based methods, struggle to produce compact segmentation results.
Our work proposes the Feature Unprojection and Fusion module as the segmentation field.
We show that our model surpasses baselines on zero-shot semantic segmentation task, improving by 10% mIoU over the best baseline.
arXiv Detail & Related papers (2024-01-11T14:05:01Z) - 2D-Guided 3D Gaussian Segmentation [15.139488857163064]
This paper introduces a 3D Gaussian segmentation method implemented with 2D segmentation as supervision.
This approach uses input 2D segmentation maps to guide the learning of the added 3D Gaussian semantic information.
Experiments show that our method can achieve comparable performances on mIOU and mAcc for multi-object segmentation.
arXiv Detail & Related papers (2023-12-26T13:28:21Z) - Segment Any 3D Gaussians [85.93694310363325]
This paper presents SAGA, a highly efficient 3D promptable segmentation method based on 3D Gaussian Splatting (3D-GS)
Given 2D visual prompts as input, SAGA can segment the corresponding 3D target represented by 3D Gaussians within 4 ms.
We show that SAGA achieves real-time multi-granularity segmentation with quality comparable to state-of-the-art methods.
arXiv Detail & Related papers (2023-12-01T17:15:24Z) - Gaussian Grouping: Segment and Edit Anything in 3D Scenes [65.49196142146292]
We propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes.
Compared to the implicit NeRF representation, we show that the grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency.
arXiv Detail & Related papers (2023-12-01T17:09:31Z) - SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D
Sequences [76.28527350263012]
We propose a method to incrementally build up semantic scene graphs from a 3D environment given a sequence of RGB-D frames.
We aggregate PointNet features from primitive scene components by means of a graph neural network.
Our approach outperforms 3D scene graph prediction methods by a large margin and its accuracy is on par with other 3D semantic and panoptic segmentation methods while running at 35 Hz.
arXiv Detail & Related papers (2021-03-27T13:00:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.