3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for
Compositional Recognition
- URL: http://arxiv.org/abs/2310.18511v2
- Date: Tue, 12 Mar 2024 11:52:42 GMT
- Title: 3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for
Compositional Recognition
- Authors: Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal
Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka,
Mohamed Elhoseiny
- Abstract summary: 3DCoMPaT$++$ is a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes.
We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects.
- Score: 53.97029821609132
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160
million rendered views of more than 10 million stylized 3D shapes carefully
annotated at the part-instance level, alongside matching RGB point clouds, 3D
textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41
shape categories, 275 fine-grained part categories, and 293 fine-grained
material classes that can be compositionally applied to parts of 3D objects. We
render a subset of one million stylized shapes from four equally spaced views
as well as four randomized views, leading to a total of 160 million renderings.
Parts are segmented at the instance level, with coarse-grained and fine-grained
semantic levels. We introduce a new task, called Grounded CoMPaT Recognition
(GCR), to collectively recognize and ground compositions of materials on parts
of 3D objects. Additionally, we report the outcomes of a data challenge
organized at CVPR2023, showcasing the winning method's utilization of a
modified PointNet$^{++}$ model trained on 6D inputs, and exploring alternative
techniques for GCR enhancement. We hope our work will help ease future research
on compositional 3D Vision.
Related papers
- PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models [51.24979014650188]
We present PointSeg, a training-free paradigm that leverages off-the-shelf vision foundation models to address 3D scene perception tasks.
PointSeg can segment anything in 3D scene by acquiring accurate 3D prompts to align their corresponding pixels across frames.
Our approach significantly surpasses the state-of-the-art specialist training-free model by 14.1$%$, 12.3$%$, and 12.6$%$ mAP on ScanNet, ScanNet++, and KITTI-360 datasets.
arXiv Detail & Related papers (2024-03-11T03:28:20Z) - CC3D: Layout-Conditioned Generation of Compositional 3D Scenes [49.281006972028194]
We introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts.
Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality.
arXiv Detail & Related papers (2023-03-21T17:59:02Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic
Segmentation [3.5939555573102853]
Recent works on 3D semantic segmentation propose to exploit the synergy between images and point clouds by processing each modality with a dedicated network.
We propose an end-to-end trainable multi-view aggregation model leveraging the viewing conditions of 3D points to merge features from images taken at arbitrary positions.
Our method can combine standard 2D and 3D networks and outperforms both 3D models operating on colorized point clouds and hybrid 2D/3D networks.
arXiv Detail & Related papers (2022-04-15T17:10:48Z) - Fine-Grained 3D Shape Classification with Hierarchical Part-View
Attentions [70.0171362989609]
We propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views.
Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2020-05-26T06:53:19Z) - Local Implicit Grid Representations for 3D Scenes [24.331110387905962]
We introduce Local Implicit Grid Representations, a new 3D shape representation designed for scalability and generality.
We train an autoencoder to learn an embedding of local crops of 3D shapes at that size.
Then, we use the decoder as a component in a shape optimization that solves for a set of latent codes on a regular grid of overlapping crops.
arXiv Detail & Related papers (2020-03-19T18:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.