GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation
- URL: http://arxiv.org/abs/2508.14036v2
- Date: Wed, 27 Aug 2025 17:10:30 GMT
- Title: GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation
- Authors: Ken Deng, Yunhan Yang, Jingxiang Sun, Xihui Liu, Yebin Liu, Ding Liang, Yan-Pei Cao,
- Abstract summary: We introduce GeoSAM2, a prompt-controllable framework for 3D part segmentation.<n>Given a textureless object, we render normal and point maps from predefined viewpoints.<n>We accept simple 2D prompts - clicks or boxes - to guide part selection.<n>The predicted masks are back-projected to the object and aggregated across views.
- Score: 81.0871900167463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce GeoSAM2, a prompt-controllable framework for 3D part segmentation that casts the task as multi-view 2D mask prediction. Given a textureless object, we render normal and point maps from predefined viewpoints and accept simple 2D prompts - clicks or boxes - to guide part selection. These prompts are processed by a shared SAM2 backbone augmented with LoRA and residual geometry fusion, enabling view-specific reasoning while preserving pretrained priors. The predicted masks are back-projected to the object and aggregated across views. Our method enables fine-grained, part-specific control without requiring text prompts, per-shape optimization, or full 3D labels. In contrast to global clustering or scale-based methods, prompts are explicit, spatially grounded, and interpretable. We achieve state-of-the-art class-agnostic performance on PartObjaverse-Tiny and PartNetE, outperforming both slow optimization-based pipelines and fast but coarse feedforward approaches. Our results highlight a new paradigm: aligning the paradigm of 3D segmentation with SAM2, leveraging interactive 2D inputs to unlock controllability and precision in object-level part understanding.
Related papers
- Evaluating SAM2 for Video Semantic Segmentation [60.157605818225186]
The Anything Model 2 (SAM2) has proven to be a powerful foundation model for promptable visual object segmentation in both images and videos.<n>This paper explores the extension of SAM2 to dense Video Semantic (VSS)<n>Our experiments suggest that leveraging SAM2 enhances overall performance in VSS, primarily due to its precise predictions of object boundaries.
arXiv Detail & Related papers (2025-12-01T15:15:16Z) - PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data [47.60227259482637]
We present PartSAM, the first promptable part segmentation model trained on large-scale 3D data.<n>PartSAM employs an encoder-decoder architecture in which a triplane-based dual-branch encoder produces spatially structured tokens.<n>To enable large-scale supervision, we introduce a model-in-the-loop annotation pipeline that curates over five million 3D shape-part pairs.
arXiv Detail & Related papers (2025-09-26T06:52:35Z) - 3D Part Segmentation via Geometric Aggregation of 2D Visual Features [57.20161517451834]
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios.<n>Recent works have explored vision-language models (VLMs) as a promising alternative, using multi-view rendering and textual prompting to identify object parts.<n>To address these limitations, we propose COPS, a COmprehensive model for Parts that blends semantics extracted from visual concepts and 3D geometry to effectively identify object parts.
arXiv Detail & Related papers (2024-12-05T15:27:58Z) - Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels.<n>FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z) - DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow [44.72037991063735]
DetailGen3D is a generative approach specifically designed to enhance generated 3D shapes.<n>Our key insight is to model the coarse-to-fine transformation directly through data-dependent flows in latent space.<n>We introduce a token matching strategy that ensures accurate spatial correspondence during refinement.
arXiv Detail & Related papers (2024-11-25T17:08:17Z) - Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking [6.599971425078935]
Existing 3D instance segmentation methods frequently encounter issues with over-segmentation, leading to redundant and inaccurate 3D proposals that complicate downstream tasks.
This challenge arises from their unsupervised merging approach, where dense 2D masks are lifted across frames into point clouds to form 3D candidate proposals without direct supervision.
We propose a 3D-Aware 2D Mask Tracking module that uses robust 3D priors from a 2D mask segmentation and tracking foundation model (SAM-2) to ensure consistent object masks across video frames.
arXiv Detail & Related papers (2024-11-25T08:26:31Z) - SA3DIP: Segment Any 3D Instance with Potential 3D Priors [41.907914881608995]
We propose SA3DIP, a novel method for Segmenting Any 3D Instances via exploiting potential 3D Priors.
Specifically, on one hand, we generate complementary 3D primitives based on both geometric and textural priors.
On the other hand, we introduce supplemental constraints from the 3D space by using a 3D detector to guide a further merging process.
arXiv Detail & Related papers (2024-11-06T10:39:00Z) - LESS: Label-Efficient and Single-Stage Referring 3D Segmentation [55.06002976797879]
Referring 3D is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query.
We propose a novel Referring 3D pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask.
We achieve state-of-the-art performance on ScanRefer dataset by surpassing the previous methods about 3.7% mIoU using only binary labels.
arXiv Detail & Related papers (2024-10-17T07:47:41Z) - NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation [52.772319840580074]
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints.
Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation.
We introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling.
arXiv Detail & Related papers (2024-03-27T04:09:34Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - 3D Guided Weakly Supervised Semantic Segmentation [27.269847900950943]
We propose a weakly supervised 2D semantic segmentation model by incorporating sparse bounding box labels with available 3D information.
We manually labeled a subset of the 2D-3D Semantics(2D-3D-S) dataset with bounding boxes, and introduce our 2D-3D inference module to generate accurate pixel-wise segment proposal masks.
arXiv Detail & Related papers (2020-12-01T03:34:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.