SAMPart3D: Segment Any Part in 3D Objects
- URL: http://arxiv.org/abs/2411.07184v2
- Date: Sat, 16 Nov 2024 07:04:42 GMT
- Title: SAMPart3D: Segment Any Part in 3D Objects
- Authors: Yunhan Yang, Yukun Huang, Yuan-Chen Guo, Liangjun Lu, Xiaoyang Wu, Edmund Y. Lam, Yan-Pei Cao, Xihui Liu,
- Abstract summary: 3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role in applications such as robotics, 3D generation, and 3D editing.
Recent methods harness the powerful Vision Language Models (VLMs) for 2D-to-3D knowledge distillation, achieving zero-shot 3D part segmentation.
In this work, we introduce SAMPart3D, a scalable zero-shot 3D part segmentation framework that segments any 3D object into semantic parts at multiple granularities.
- Score: 23.97392239910013
- License:
- Abstract: 3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role in applications such as robotics, 3D generation, and 3D editing. Recent methods harness the powerful Vision Language Models (VLMs) for 2D-to-3D knowledge distillation, achieving zero-shot 3D part segmentation. However, these methods are limited by their reliance on text prompts, which restricts the scalability to large-scale unlabeled datasets and the flexibility in handling part ambiguities. In this work, we introduce SAMPart3D, a scalable zero-shot 3D part segmentation framework that segments any 3D object into semantic parts at multiple granularities, without requiring predefined part label sets as text prompts. For scalability, we use text-agnostic vision foundation models to distill a 3D feature extraction backbone, allowing scaling to large unlabeled 3D datasets to learn rich 3D priors. For flexibility, we distill scale-conditioned part-aware 3D features for 3D part segmentation at multiple granularities. Once the segmented parts are obtained from the scale-conditioned part-aware 3D features, we use VLMs to assign semantic labels to each part based on the multi-view renderings. Compared to previous methods, our SAMPart3D can scale to the recent large-scale 3D object dataset Objaverse and handle complex, non-ordinary objects. Additionally, we contribute a new 3D part segmentation benchmark to address the lack of diversity and complexity of objects and parts in existing benchmarks. Experiments show that our SAMPart3D significantly outperforms existing zero-shot 3D part segmentation methods, and can facilitate various applications such as part-level editing and interactive segmentation.
Related papers
- 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences [33.99493183183571]
We propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object part segmentation.
We present our novel approach, termed 3-By-2 that achieves SOTA performance on different benchmarks with various granularity levels.
arXiv Detail & Related papers (2024-07-12T19:08:00Z) - Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models [20.277479473218513]
We introduce a new task: Zero-Shot 3D Reasoning for parts searching and localization for objects.
We design a simple baseline method, Reasoning3D, with the capability to understand and execute complex commands.
We show that Reasoning3D can effectively localize and highlight parts of 3D objects based on implicit textual queries.
arXiv Detail & Related papers (2024-05-29T17:56:07Z) - Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model [108.35777542298224]
This paper introduces Reason3D, a novel large language model for comprehensive 3D understanding.
We propose a hierarchical mask decoder to locate small objects within expansive scenes.
Experiments validate that Reason3D achieves remarkable results on large-scale ScanNet and Matterport3D datasets.
arXiv Detail & Related papers (2024-05-27T17:59:41Z) - PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model [19.333506797686695]
We introduce a novel segmentation task known as reasoning part segmentation for 3D objects.
We output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object.
We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations.
arXiv Detail & Related papers (2024-04-04T23:38:45Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - SAM-guided Graph Cut for 3D Instance Segmentation [60.75119991853605]
This paper addresses the challenge of 3D instance segmentation by simultaneously leveraging 3D geometric and multi-view image information.
We introduce a novel 3D-to-2D query framework to effectively exploit 2D segmentation models for 3D instance segmentation.
Our method achieves robust segmentation performance and can generalize across different types of scenes.
arXiv Detail & Related papers (2023-12-13T18:59:58Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - A One Stop 3D Target Reconstruction and multilevel Segmentation Method [0.0]
We propose an open-source one stop 3D target reconstruction and multilevel segmentation framework (OSTRA)
OSTRA performs segmentation on 2D images, tracks multiple instances with segmentation labels in the image sequence, and then reconstructs labelled 3D objects or multiple parts with Multi-View Stereo (MVS) or RGBD-based 3D reconstruction methods.
Our method opens up a new avenue for reconstructing 3D targets embedded with rich multi-scale segmentation information in complex scenes.
arXiv Detail & Related papers (2023-08-14T07:12:31Z) - PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained
Image-Language Models [56.324516906160234]
Generalizable 3D part segmentation is important but challenging in vision and robotics.
This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP.
We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm.
arXiv Detail & Related papers (2022-12-03T06:59:01Z) - ONeRF: Unsupervised 3D Object Segmentation from Multiple Views [59.445957699136564]
ONeRF is a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations.
The segmented 3D objects are represented using separate Neural Radiance Fields (NeRFs) which allow for various 3D scene editing and novel view rendering.
arXiv Detail & Related papers (2022-11-22T06:19:37Z) - Semi-supervised 3D shape segmentation with multilevel consistency and
part substitution [21.075426681857024]
We propose an effective semi-supervised method for learning 3D segmentations from a few labeled 3D shapes and a large amount of unlabeled 3D data.
For the unlabeled data, we present a novel multilevel consistency loss to enforce consistency of network predictions between perturbed copies of a 3D shape.
For the labeled data, we develop a simple yet effective part substitution scheme to augment the labeled 3D shapes with more structural variations to enhance training.
arXiv Detail & Related papers (2022-04-19T11:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.