PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model
- URL: http://arxiv.org/abs/2404.03836v1
- Date: Thu, 4 Apr 2024 23:38:45 GMT
- Title: PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model
- Authors: Amrin Kareem, Jean Lahoud, Hisham Cholakkal,
- Abstract summary: We introduce a novel segmentation task known as reasoning part segmentation for 3D objects.
We output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object.
We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations.
- Score: 19.333506797686695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in 3D perception systems have significantly improved their ability to perform visual recognition tasks such as segmentation. However, these systems still heavily rely on explicit human instruction to identify target objects or categories, lacking the capability to actively reason and comprehend implicit user intentions. We introduce a novel segmentation task known as reasoning part segmentation for 3D objects, aiming to output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object. To facilitate evaluation and benchmarking, we present a large 3D dataset comprising over 60k instructions paired with corresponding ground-truth part segmentation annotations specifically curated for reasoning-based 3D part segmentation. We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations corresponding to 3D object segmentation requests. Experiments show that our method achieves competitive performance to models that use explicit queries, with the additional abilities to identify part concepts, reason about them, and complement them with world knowledge. Our source code, dataset, and trained models are available at https://github.com/AmrinKareem/PARIS3D.
Related papers
- SegPoint: Segment Any Point Cloud via Large Language Model [62.69797122055389]
We propose a model, called SegPoint, to produce point-wise segmentation masks across a diverse range of tasks.
SegPoint is the first model to address varied segmentation tasks within a single framework.
arXiv Detail & Related papers (2024-07-18T17:58:03Z) - 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences [33.99493183183571]
We propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object part segmentation.
We present our novel approach, termed 3-By-2 that achieves SOTA performance on different benchmarks with various granularity levels.
arXiv Detail & Related papers (2024-07-12T19:08:00Z) - 3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data [0.0]
2D region based convolutional neural networks (Mask R-CNN) deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.
In order to generate 3D point cloud coordinates, segmented 2D pixels of recognized object regions in the RGB image are merged into (u, v) points of the depth image.
arXiv Detail & Related papers (2024-06-19T08:00:35Z) - Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models [20.277479473218513]
We introduce a new task: Zero-Shot 3D Reasoning for parts searching and localization for objects.
We design a simple baseline method, Reasoning3D, with the capability to understand and execute complex commands.
We show that Reasoning3D can effectively localize and highlight parts of 3D objects based on implicit textual queries.
arXiv Detail & Related papers (2024-05-29T17:56:07Z) - Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model [108.35777542298224]
This paper introduces Reason3D, a novel large language model for comprehensive 3D understanding.
We propose a hierarchical mask decoder to locate small objects within expansive scenes.
Experiments validate that Reason3D achieves remarkable results on large-scale ScanNet and Matterport3D datasets.
arXiv Detail & Related papers (2024-05-27T17:59:41Z) - Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without
Manual Labels [141.23836433191624]
Current 3D scene segmentation methods are heavily dependent on manually annotated 3D training datasets.
We propose Segment3D, a method for class-agnostic 3D scene segmentation that produces high-quality 3D segmentation masks.
arXiv Detail & Related papers (2023-12-28T18:57:11Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - A One Stop 3D Target Reconstruction and multilevel Segmentation Method [0.0]
We propose an open-source one stop 3D target reconstruction and multilevel segmentation framework (OSTRA)
OSTRA performs segmentation on 2D images, tracks multiple instances with segmentation labels in the image sequence, and then reconstructs labelled 3D objects or multiple parts with Multi-View Stereo (MVS) or RGBD-based 3D reconstruction methods.
Our method opens up a new avenue for reconstructing 3D targets embedded with rich multi-scale segmentation information in complex scenes.
arXiv Detail & Related papers (2023-08-14T07:12:31Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - 3D Concept Grounding on Neural Fields [99.33215488324238]
Existing visual reasoning approaches typically utilize supervised methods to extract 2D segmentation masks on which concepts are grounded.
Humans are capable of grounding concepts on the underlying 3D representation of images.
We propose to leverage the continuous, differentiable nature of neural fields to segment and learn concepts.
arXiv Detail & Related papers (2022-07-13T17:59:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.