Related papers: MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation

MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation

URL: http://arxiv.org/abs/2401.07745v2
Date: Wed, 10 Apr 2024 15:30:23 GMT
Title: MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
Authors: Mi Yan, Jiazhao Zhang, Yan Zhu, He Wang,
Abstract summary: Open-vocabulary 3D instance segmentation is cutting-edge for its ability to segment 3D instances without predefined categories. Recent works first generate 2D open-vocabulary masks through 2D models and then merge them into 3D instances based on metrics calculated between two neighboring frames. We propose a novel metric, view consensus rate, to enhance the utilization of multi-view observations.
Score: 11.123421412837336
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-vocabulary 3D instance segmentation is cutting-edge for its ability to segment 3D instances without predefined categories. However, progress in 3D lags behind its 2D counterpart due to limited annotated 3D data. To address this, recent works first generate 2D open-vocabulary masks through 2D models and then merge them into 3D instances based on metrics calculated between two neighboring frames. In contrast to these local metrics, we propose a novel metric, view consensus rate, to enhance the utilization of multi-view observations. The key insight is that two 2D masks should be deemed part of the same 3D instance if a significant number of other 2D masks from different views contain both these two masks. Using this metric as edge weight, we construct a global mask graph where each mask is a node. Through iterative clustering of masks showing high view consensus, we generate a series of clusters, each representing a distinct 3D instance. Notably, our model is training-free. Through extensive experiments on publicly available datasets, including ScanNet++, ScanNet200 and MatterPort3D, we demonstrate that our method achieves state-of-the-art performance in open-vocabulary 3D instance segmentation. Our project page is at https://pku-epic.github.io/MaskClustering.

Related papers

DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting [0.0]
We present a decoupled 3D segmentation pipeline to ensure modularity and adaptability to novel 3D representations. We evaluate on synthetic and real-world indoor datasets, demonstrating improved performance over comparable NeRF-based pipelines.
arXiv Detail & Related papers (2024-12-14T21:26:44Z)
CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation [13.871856894814005]
We propose to cut semantic masks in 3D to obtain the final 2D instances by utilizing a point cloud representation of the scene. We also propose to augment the training of a class-agnostic detector with three Spatial Confidence components aiming to isolate a clean learning signal.
arXiv Detail & Related papers (2024-11-25T12:11:27Z)
Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking [6.599971425078935]
Existing 3D instance segmentation methods frequently encounter issues with over-segmentation, leading to redundant and inaccurate 3D proposals that complicate downstream tasks. This challenge arises from their unsupervised merging approach, where dense 2D masks are lifted across frames into point clouds to form 3D candidate proposals without direct supervision. We propose a 3D-Aware 2D Mask Tracking module that uses robust 3D priors from a 2D mask segmentation and tracking foundation model (SAM-2) to ensure consistent object masks across video frames.
arXiv Detail & Related papers (2024-11-25T08:26:31Z)
XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation [72.12250272218792]
We propose a more meticulous mask-level alignment between 3D features and the 2D-text embedding space through a cross-modal mask reasoning framework, XMask3D. We integrate 3D global features as implicit conditions into the pre-trained 2D denoising UNet, enabling the generation of segmentation masks. The generated 2D masks are employed to align mask-level 3D representations with the vision-language feature space, thereby augmenting the open vocabulary capability of 3D geometry embeddings.
arXiv Detail & Related papers (2024-11-20T12:02:12Z)
EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration. An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z)
SAM-guided Graph Cut for 3D Instance Segmentation [60.75119991853605]
This paper addresses the challenge of 3D instance segmentation by simultaneously leveraging 3D geometric and multi-view image information. We introduce a novel 3D-to-2D query framework to effectively exploit 2D segmentation models for 3D instance segmentation. Our method achieves robust segmentation performance and can generalize across different types of scenes.
arXiv Detail & Related papers (2023-12-13T18:59:58Z)
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation [32.508069732371105]
OpenIns3D is a new 3D-input-only framework for 3D open-vocabulary scene understanding. It achieves state-of-the-art performance across a wide range of 3D open-vocabulary tasks.
arXiv Detail & Related papers (2023-09-01T17:59:56Z)
OpenMask3D: Open-Vocabulary 3D Instance Segmentation [84.58747201179654]
OpenMask3D is a zero-shot approach for open-vocabulary 3D instance segmentation. Our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings.
arXiv Detail & Related papers (2023-06-23T17:36:44Z)
Segment Anything in 3D with Radiance Fields [83.14130158502493]
This paper generalizes the Segment Anything Model (SAM) to segment 3D objects. We refer to the proposed solution as SA3D, short for Segment Anything in 3D. We show in experiments that SA3D adapts to various scenes and achieves 3D segmentation within seconds.
arXiv Detail & Related papers (2023-04-24T17:57:15Z)
Mask3D: Mask Transformer for 3D Semantic Instance Segmentation [89.41640045953378]
We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds. Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales. Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP),LS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP)
arXiv Detail & Related papers (2022-10-06T17:55:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.