3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes
- URL: http://arxiv.org/abs/2501.06785v1
- Date: Sun, 12 Jan 2025 11:46:07 GMT
- Title: 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes
- Authors: Mahmoud Ahmed, Xiang Li, Arpit Prajapati, Mohamed Elhoseiny,
- Abstract summary: 3DCoMPaT200 is a large-scale dataset tailored for compositional understanding of object parts and materials.
It features 200 object categories with $approx$5 times larger object vocabulary compared to 3DCoMPaT and $approx$ 4 times larger part categories.
To address the complexities of compositional 3D modeling, we propose a novel task of Compositional Part Shape Retrieval.
- Score: 29.8054021078428
- License:
- Abstract: Understanding objects in 3D at the part level is essential for humans and robots to navigate and interact with the environment. Current datasets for part-level 3D object understanding encompass a limited range of categories. For instance, the ShapeNet-Part and PartNet datasets only include 16, and 24 object categories respectively. The 3DCoMPaT dataset, specifically designed for compositional understanding of parts and materials, contains only 42 object categories. To foster richer and fine-grained part-level 3D understanding, we introduce 3DCoMPaT200, a large-scale dataset tailored for compositional understanding of object parts and materials, with 200 object categories with $\approx$5 times larger object vocabulary compared to 3DCoMPaT and $\approx$ 4 times larger part categories. Concretely, 3DCoMPaT200 significantly expands upon 3DCoMPaT, featuring 1,031 fine-grained part categories and 293 distinct material classes for compositional application to 3D object parts. Additionally, to address the complexities of compositional 3D modeling, we propose a novel task of Compositional Part Shape Retrieval using ULIP to provide a strong 3D foundational model for 3D Compositional Understanding. This method evaluates the model shape retrieval performance given one, three, or six parts described in text format. These results show that the model's performance improves with an increasing number of style compositions, highlighting the critical role of the compositional dataset. Such results underscore the dataset's effectiveness in enhancing models' capability to understand complex 3D shapes from a compositional perspective. Code and Data can be found at http://github.com/3DCoMPaT200/3DCoMPaT200
Related papers
- PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models [63.1432721793683]
We introduce PartGen, a novel approach that generates 3D objects composed of meaningful parts starting from text, an image, or an unstructured 3D object.
We evaluate our method on generated and real 3D assets and show that it outperforms segmentation and part-extraction baselines by a large margin.
arXiv Detail & Related papers (2024-12-24T18:59:43Z) - 3D Part Segmentation via Geometric Aggregation of 2D Visual Features [57.20161517451834]
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios.
Recent works have explored vision-language models (VLMs) as a promising alternative, using multi-view rendering and textual prompting to identify object parts.
To address these limitations, we propose COPS, a COmprehensive model for Parts that blends semantics extracted from visual concepts and 3D geometry to effectively identify object parts.
arXiv Detail & Related papers (2024-12-05T15:27:58Z) - PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model [19.333506797686695]
We introduce a novel segmentation task known as reasoning part segmentation for 3D objects.
We output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object.
We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations.
arXiv Detail & Related papers (2024-04-04T23:38:45Z) - 3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for
Compositional Recognition [53.97029821609132]
3DCoMPaT$++$ is a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes.
We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects.
arXiv Detail & Related papers (2023-10-27T22:01:43Z) - Structure from Action: Learning Interactions for Articulated Object 3D
Structure Discovery [18.96346371296251]
We introduce Structure from Action (SfA), a framework to discover 3D part geometry and joint parameters of unseen articulated objects.
By selecting informative interactions, SfA discovers parts and reveals occluded surfaces, like the inside of a closed drawer.
Empirically, SfA outperforms a pipeline of state-of-the-art components by 25.4 3D IoU percentage points on unseen categories.
arXiv Detail & Related papers (2022-07-19T00:27:36Z) - Scan2Part: Fine-grained and Hierarchical Part-level Understanding of
Real-World 3D Scans [68.98085986594411]
We propose Scan2Part, a method to segment individual parts of objects in real-world, noisy indoor RGB-D scans.
We use a sparse U-Net-based architecture that captures the fine-scale detail of the underlying 3D scan geometry.
As output, we are able to predict fine-grained per-object part labels, even when the geometry is coarse or partially missing.
arXiv Detail & Related papers (2022-06-06T05:43:10Z) - Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life
3D Category Reconstruction [7.013794773659423]
Common Objects in 3D is a large-scale dataset with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds.
The dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories.
We exploit this new dataset to conduct one of the first large-scale "in-the-wild" evaluations of several new-view-synthesis and category-centric 3D reconstruction methods.
arXiv Detail & Related papers (2021-09-01T17:59:05Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z) - Fine-Grained 3D Shape Classification with Hierarchical Part-View
Attentions [70.0171362989609]
We propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views.
Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2020-05-26T06:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.