Related papers: PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data

PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data

URL: http://arxiv.org/abs/2509.21965v2
Date: Mon, 29 Sep 2025 05:34:25 GMT
Title: PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
Authors: Zhe Zhu, Le Wan, Rui Xu, Yiheng Zhang, Honghua Chen, Zhiyang Dou, Cheng Lin, Yuan Liu, Mingqiang Wei,
Abstract summary: We present PartSAM, the first promptable part segmentation model trained on large-scale 3D data.<n>PartSAM employs an encoder-decoder architecture in which a triplane-based dual-branch encoder produces spatially structured tokens.<n>To enable large-scale supervision, we introduce a model-in-the-loop annotation pipeline that curates over five million 3D shape-part pairs.
Score: 47.60227259482637
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Segmenting 3D objects into parts is a long-standing challenge in computer vision. To overcome taxonomy constraints and generalize to unseen 3D objects, recent works turn to open-world part segmentation. These approaches typically transfer supervision from 2D foundation models, such as SAM, by lifting multi-view masks into 3D. However, this indirect paradigm fails to capture intrinsic geometry, leading to surface-only understanding, uncontrolled decomposition, and limited generalization. We present PartSAM, the first promptable part segmentation model trained natively on large-scale 3D data. Following the design philosophy of SAM, PartSAM employs an encoder-decoder architecture in which a triplane-based dual-branch encoder produces spatially structured tokens for scalable part-aware representation learning. To enable large-scale supervision, we further introduce a model-in-the-loop annotation pipeline that curates over five million 3D shape-part pairs from online assets, providing diverse and fine-grained labels. This combination of scalable architecture and diverse 3D data yields emergent open-world capabilities: with a single prompt, PartSAM achieves highly accurate part identification, and in a Segment-Every-Part mode, it automatically decomposes shapes into both surface and internal structures. Extensive experiments show that PartSAM outperforms state-of-the-art methods by large margins across multiple benchmarks, marking a decisive step toward foundation models for 3D part understanding.

Related papers

MV-SAM: Multi-view Promptable Segmentation using Pointmap Guidance [79.57732829495843]
We introduce MV-SAM, a framework for multi-view segmentation that achieves 3D consistency using pointmaps.<n>MV-SAM lifts images and prompts into 3D space, eliminating the need for explicit 3D networks or annotated 3D data.
arXiv Detail & Related papers (2026-01-25T15:00:37Z)
Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion [4.679314646805623]
3D segmentation is critical for understanding complex scenes with dense layouts and multi-scale objects.<n>Existing 3D point-based methods require costly annotations, while image-guided methods often suffer from semantic inconsistencies across views.<n>We propose a hierarchical image-guided 3D segmentation framework that progressively refines segmentation from instance-level to part-level.
arXiv Detail & Related papers (2025-12-07T15:15:52Z)
P3-SAM: Native 3D Part Segmentation [29.513191657051575]
We propose a native 3D point-promptable part segmentation model termed P$3$-SAM.<n>Inspired by SAM, P$3$-SAM consists of a feature extractor, multiple segmentation heads, and an IoU predictor.<n>Our model is trained on a newly built dataset containing nearly 3.7 million models with reasonable segmentation labels.
arXiv Detail & Related papers (2025-09-08T15:12:17Z)
GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation [81.0871900167463]
We introduce GeoSAM2, a prompt-controllable framework for 3D part segmentation.<n>Given a textureless object, we render normal and point maps from predefined viewpoints.<n>We accept simple 2D prompts - clicks or boxes - to guide part selection.<n>The predicted masks are back-projected to the object and aggregated across views.
arXiv Detail & Related papers (2025-08-19T17:58:51Z)
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers [29.52313100024294]
We introduce PartCrafter, the first structured 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D meshes from a single RGB image.<n>PartCrafter simultaneously denoises multiple 3D parts, enabling end-to-end part-aware generation of both individual objects and complex multi-object scenes.<n> Experiments show that PartCrafter outperforms existing approaches in generating decomposable 3D meshes.
arXiv Detail & Related papers (2025-06-05T20:30:28Z)
3D Part Segmentation via Geometric Aggregation of 2D Visual Features [57.20161517451834]
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios.<n>Recent works have explored vision-language models (VLMs) as a promising alternative, using multi-view rendering and textual prompting to identify object parts.<n>To address these limitations, we propose COPS, a COmprehensive model for Parts that blends semantics extracted from visual concepts and 3D geometry to effectively identify object parts.
arXiv Detail & Related papers (2024-12-05T15:27:58Z)
SAMPart3D: Segment Any Part in 3D Objects [23.97392239910013]
3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role in applications such as robotics, 3D generation, and 3D editing. Recent methods harness the powerful Vision Language Models (VLMs) for 2D-to-3D knowledge distillation, achieving zero-shot 3D part segmentation. In this work, we introduce SAMPart3D, a scalable zero-shot 3D part segmentation framework that segments any 3D object into semantic parts at multiple granularities.
arXiv Detail & Related papers (2024-11-11T17:59:10Z)
Point-SAM: Promptable 3D Segmentation Model for Point Clouds [25.98791840584803]
We propose a 3D promptable segmentation model Point-SAM, focusing on point clouds.<n>We employ an efficient transformer-based architecture tailored for point clouds, extending SAM to the 3D domain.<n>We then distill the rich knowledge from 2D SAM for Point-SAM training by introducing a data engine to generate part-level and object-level pseudo-labels at scale from 2D SAM.
arXiv Detail & Related papers (2024-06-25T17:28:03Z)
Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image. We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space. A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z)
SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach. Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations. Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.