PinPoint3D: Fine-Grained 3D Part Segmentation from a Few Clicks
- URL: http://arxiv.org/abs/2509.25970v1
- Date: Tue, 30 Sep 2025 09:05:29 GMT
- Title: PinPoint3D: Fine-Grained 3D Part Segmentation from a Few Clicks
- Authors: Bojun Zhang, Hangjian Ye, Hao Zheng, Jianzheng Huang, Zhengyu Lin, Zhenhong Guo, Feng Zheng,
- Abstract summary: PinPoint3D is a novel interactive framework for fine-grained, multi-granularity 3D segmentation.<n>It generates precise part-level masks from only a few user point clicks.<n>Our work represents a significant step towards more nuanced and precise machine perception and interaction.
- Score: 37.718136287542556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained 3D part segmentation is crucial for enabling embodied AI systems to perform complex manipulation tasks, such as interacting with specific functional components of an object. However, existing interactive segmentation methods are largely confined to coarse, instance-level targets, while non-interactive approaches struggle with sparse, real-world scans and suffer from a severe lack of annotated data. To address these limitations, we introduce PinPoint3D, a novel interactive framework for fine-grained, multi-granularity 3D segmentation, capable of generating precise part-level masks from only a few user point clicks. A key component of our work is a new 3D data synthesis pipeline that we developed to create a large-scale, scene-level dataset with dense part annotations, overcoming a critical bottleneck that has hindered progress in this field. Through comprehensive experiments and user studies, we demonstrate that our method significantly outperforms existing approaches, achieving an average IoU of around 55.8% on each object part under first-click settings and surpassing 71.3% IoU with only a few additional clicks. Compared to current state-of-the-art baselines, PinPoint3D yields up to a 16% improvement in IoU and precision, highlighting its effectiveness on challenging, sparse point clouds with high efficiency. Our work represents a significant step towards more nuanced and precise machine perception and interaction in complex 3D environments.
Related papers
- Language-guided 3D scene synthesis for fine-grained functionality understanding [64.148891566272]
We introduce SynthFun3D, the first method for task-based 3D scene synthesis.<n>It generates a 3D indoor environment using a furniture asset database with part-level annotation.<n>It reasons about the action to automatically identify and retrieve the 3D mask of the correct functional element.
arXiv Detail & Related papers (2025-11-28T14:40:03Z) - REACT3D: Recovering Articulations for Interactive Physical 3D Scenes [96.27769519526426]
REACT3D is a framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry.<n>We achieve state-of-the-art performance on detection/segmentation and articulation metrics across diverse indoor scenes.
arXiv Detail & Related papers (2025-10-13T12:37:59Z) - Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation [10.2138250640885]
We introduce a 3D interactive segmentation method that consistently surpasses previous state-of-the-art techniques on both in-domain and out-of-domain datasets.<n>Our simple approach integrates a voxel-based sparse encoder with a lightweight transformer-based decoder that implements implicit click fusion.<n>Our method demonstrates substantial improvements on benchmark datasets, including ScanNet, ScanNet++, S3DIS, and KITTI-360.
arXiv Detail & Related papers (2025-04-15T09:49:51Z) - IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.<n>We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.<n>We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z) - FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction [1.8124328823188356]
We aim to develop a representation that enables robots to directly interact with their environment.<n>We focus on detecting and storing objects at a finer resolution, focusing on affordance-relevant parts.<n>We leverage currently available 3D resources to generate 2D data and train a detector, which is then used to augment the standard 3D scene graph generation pipeline.
arXiv Detail & Related papers (2025-03-10T23:13:35Z) - 3D-CDRGP: Towards Cross-Device Robotic Grasping Policy in 3D Open World [20.406334587479623]
Cross-device research has become an urgent issue that needs to be tackled.<n>We pioneer in probing the cross-device (cameras & robotics) grasping policy in the 3D open world.<n>We introduce the SSGC-Seg module that enables category-agnostic 3D object detection.
arXiv Detail & Related papers (2024-11-27T08:23:28Z) - 3D-Aware Instance Segmentation and Tracking in Egocentric Videos [107.10661490652822]
Egocentric videos present unique challenges for 3D scene understanding.
This paper introduces a novel approach to instance segmentation and tracking in first-person video.
By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches.
arXiv Detail & Related papers (2024-08-19T10:08:25Z) - iDet3D: Towards Efficient Interactive Object Detection for LiDAR Point
Clouds [39.261055567560724]
We propose iDet3D, an efficient interactive 3D object detector.
iDet3D supports a user-friendly 2D interface, which can ease the cognitive burden of exploring 3D space.
We show that our method can construct precise annotations in a few clicks.
arXiv Detail & Related papers (2023-12-24T09:59:46Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - Real3D-Aug: Point Cloud Augmentation by Placing Real Objects with
Occlusion Handling for 3D Detection and Segmentation [0.0]
We propose a data augmentation method that takes advantage of already annotated data multiple times.
We propose an augmentation framework that reuses real data, automatically finds suitable placements in the scene to be augmented.
The pipeline proves competitive in training top-performing models for 3D object detection and semantic segmentation.
arXiv Detail & Related papers (2022-06-15T16:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.