Related papers: SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing

SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing

URL: http://arxiv.org/abs/2509.05144v1
Date: Fri, 05 Sep 2025 14:37:31 GMT
Title: SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing
Authors: Chaolei Wang, Yang Luo, Jing Du, Siyu Chen, Yiping Chen, Ting Han,
Abstract summary: We propose splitting and growing reliable semantic masks for high-fidelity 3D instance segmentation (SGS-3D)<n>For semantic guidance, we introduce a mask filtering strategy that leverages the co-occurrence of 3D geometry primitives.<n>For the geometric refinement, we construct fine-grained object instances by exploiting both spatial continuity and high-level features.
Score: 20.383892902000976
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate 3D instance segmentation is crucial for high-quality scene understanding in the 3D vision domain. However, 3D instance segmentation based on 2D-to-3D lifting approaches struggle to produce precise instance-level segmentation, due to accumulated errors introduced during the lifting process from ambiguous semantic guidance and insufficient depth constraints. To tackle these challenges, we propose splitting and growing reliable semantic mask for high-fidelity 3D instance segmentation (SGS-3D), a novel "split-then-grow" framework that first purifies and splits ambiguous lifted masks using geometric primitives, and then grows them into complete instances within the scene. Unlike existing approaches that directly rely on raw lifted masks and sacrifice segmentation accuracy, SGS-3D serves as a training-free refinement method that jointly fuses semantic and geometric information, enabling effective cooperation between the two levels of representation. Specifically, for semantic guidance, we introduce a mask filtering strategy that leverages the co-occurrence of 3D geometry primitives to identify and remove ambiguous masks, thereby ensuring more reliable semantic consistency with the 3D object instances. For the geometric refinement, we construct fine-grained object instances by exploiting both spatial continuity and high-level features, particularly in the case of semantic ambiguity between distinct objects. Experimental results on ScanNet200, ScanNet++, and KITTI-360 demonstrate that SGS-3D substantially improves segmentation accuracy and robustness against inaccurate masks from pre-trained models, yielding high-fidelity object instances while maintaining strong generalization across diverse indoor and outdoor environments. Code is available in the supplementary materials.

Related papers

IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction [82.53307702809606]
Humans naturally perceive the geometric structure and semantic content of a 3D world as intertwined dimensions.<n>We propose InstanceGrounded Geometry Transformer (IGGT) to unify the knowledge for both spatial reconstruction and instance-level contextual understanding.
arXiv Detail & Related papers (2025-10-26T14:57:44Z)
CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D [0.0]
3D scene understanding is fundamental for embodied AI and robotics, supporting reliable perception for interaction and navigation.<n>Recent approaches achieve zero-shot, open-vocabulary 3D semantic mapping by assigning embedding vectors to 2D class-agnostic masks generated via vision-language models (VLMs)<n>We leverage SemanticSAM with progressive granularity refinement to generate more accurate and numerous object-level masks.
arXiv Detail & Related papers (2025-09-29T09:43:00Z)
SeqAffordSplat: Scene-level Sequential Affordance Reasoning on 3D Gaussian Splatting [85.87902260102652]
We introduce the novel task of Sequential 3D Gaussian Affordance Reasoning.<n>We then propose SeqSplatNet, an end-to-end framework that directly maps an instruction to a sequence of 3D affordance masks.<n>Our method sets a new state-of-the-art on our challenging benchmark, effectively advancing affordance reasoning from single-step interactions to complex, sequential tasks at the scene level.
arXiv Detail & Related papers (2025-07-31T17:56:55Z)
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation [87.30919771444117]
Reasoning segmentation aims to segment target objects in complex scenes based on human intent and spatial reasoning.<n>Recent multimodal large language models (MLLMs) have demonstrated impressive 2D image reasoning segmentation.<n>We introduce MLLM-For3D, a framework that transfers knowledge from 2D MLLMs to 3D scene understanding.
arXiv Detail & Related papers (2025-03-23T16:40:20Z)
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels.<n>FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z)
XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation [72.12250272218792]
We propose a more meticulous mask-level alignment between 3D features and the 2D-text embedding space through a cross-modal mask reasoning framework, XMask3D. We integrate 3D global features as implicit conditions into the pre-trained 2D denoising UNet, enabling the generation of segmentation masks. The generated 2D masks are employed to align mask-level 3D representations with the vision-language feature space, thereby augmenting the open vocabulary capability of 3D geometry embeddings.
arXiv Detail & Related papers (2024-11-20T12:02:12Z)
SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach. Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations. Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.