PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
- URL: http://arxiv.org/abs/2509.21965v2
- Date: Mon, 29 Sep 2025 05:34:25 GMT
- Title: PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
- Authors: Zhe Zhu, Le Wan, Rui Xu, Yiheng Zhang, Honghua Chen, Zhiyang Dou, Cheng Lin, Yuan Liu, Mingqiang Wei,
- Abstract summary: We present PartSAM, the first promptable part segmentation model trained on large-scale 3D data.<n>PartSAM employs an encoder-decoder architecture in which a triplane-based dual-branch encoder produces spatially structured tokens.<n>To enable large-scale supervision, we introduce a model-in-the-loop annotation pipeline that curates over five million 3D shape-part pairs.
- Score: 47.60227259482637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmenting 3D objects into parts is a long-standing challenge in computer vision. To overcome taxonomy constraints and generalize to unseen 3D objects, recent works turn to open-world part segmentation. These approaches typically transfer supervision from 2D foundation models, such as SAM, by lifting multi-view masks into 3D. However, this indirect paradigm fails to capture intrinsic geometry, leading to surface-only understanding, uncontrolled decomposition, and limited generalization. We present PartSAM, the first promptable part segmentation model trained natively on large-scale 3D data. Following the design philosophy of SAM, PartSAM employs an encoder-decoder architecture in which a triplane-based dual-branch encoder produces spatially structured tokens for scalable part-aware representation learning. To enable large-scale supervision, we further introduce a model-in-the-loop annotation pipeline that curates over five million 3D shape-part pairs from online assets, providing diverse and fine-grained labels. This combination of scalable architecture and diverse 3D data yields emergent open-world capabilities: with a single prompt, PartSAM achieves highly accurate part identification, and in a Segment-Every-Part mode, it automatically decomposes shapes into both surface and internal structures. Extensive experiments show that PartSAM outperforms state-of-the-art methods by large margins across multiple benchmarks, marking a decisive step toward foundation models for 3D part understanding.
Related papers
- MV-SAM: Multi-view Promptable Segmentation using Pointmap Guidance [79.57732829495843]
We introduce MV-SAM, a framework for multi-view segmentation that achieves 3D consistency using pointmaps.<n>MV-SAM lifts images and prompts into 3D space, eliminating the need for explicit 3D networks or annotated 3D data.
arXiv Detail & Related papers (2026-01-25T15:00:37Z) - Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion [4.679314646805623]
3D segmentation is critical for understanding complex scenes with dense layouts and multi-scale objects.<n>Existing 3D point-based methods require costly annotations, while image-guided methods often suffer from semantic inconsistencies across views.<n>We propose a hierarchical image-guided 3D segmentation framework that progressively refines segmentation from instance-level to part-level.
arXiv Detail & Related papers (2025-12-07T15:15:52Z) - P3-SAM: Native 3D Part Segmentation [29.513191657051575]
We propose a native 3D point-promptable part segmentation model termed P$3$-SAM.<n>Inspired by SAM, P$3$-SAM consists of a feature extractor, multiple segmentation heads, and an IoU predictor.<n>Our model is trained on a newly built dataset containing nearly 3.7 million models with reasonable segmentation labels.
arXiv Detail & Related papers (2025-09-08T15:12:17Z) - GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation [81.0871900167463]
We introduce GeoSAM2, a prompt-controllable framework for 3D part segmentation.<n>Given a textureless object, we render normal and point maps from predefined viewpoints.<n>We accept simple 2D prompts - clicks or boxes - to guide part selection.<n>The predicted masks are back-projected to the object and aggregated across views.
arXiv Detail & Related papers (2025-08-19T17:58:51Z) - PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers [29.52313100024294]
We introduce PartCrafter, the first structured 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D meshes from a single RGB image.<n>PartCrafter simultaneously denoises multiple 3D parts, enabling end-to-end part-aware generation of both individual objects and complex multi-object scenes.<n> Experiments show that PartCrafter outperforms existing approaches in generating decomposable 3D meshes.
arXiv Detail & Related papers (2025-06-05T20:30:28Z) - 3D Part Segmentation via Geometric Aggregation of 2D Visual Features [57.20161517451834]
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios.<n>Recent works have explored vision-language models (VLMs) as a promising alternative, using multi-view rendering and textual prompting to identify object parts.<n>To address these limitations, we propose COPS, a COmprehensive model for Parts that blends semantics extracted from visual concepts and 3D geometry to effectively identify object parts.
arXiv Detail & Related papers (2024-12-05T15:27:58Z) - SAMPart3D: Segment Any Part in 3D Objects [23.97392239910013]
3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role in applications such as robotics, 3D generation, and 3D editing.
Recent methods harness the powerful Vision Language Models (VLMs) for 2D-to-3D knowledge distillation, achieving zero-shot 3D part segmentation.
In this work, we introduce SAMPart3D, a scalable zero-shot 3D part segmentation framework that segments any 3D object into semantic parts at multiple granularities.
arXiv Detail & Related papers (2024-11-11T17:59:10Z) - Point-SAM: Promptable 3D Segmentation Model for Point Clouds [25.98791840584803]
We propose a 3D promptable segmentation model Point-SAM, focusing on point clouds.<n>We employ an efficient transformer-based architecture tailored for point clouds, extending SAM to the 3D domain.<n>We then distill the rich knowledge from 2D SAM for Point-SAM training by introducing a data engine to generate part-level and object-level pseudo-labels at scale from 2D SAM.
arXiv Detail & Related papers (2024-06-25T17:28:03Z) - Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image.
We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space.
A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.