Model2Scene: Learning 3D Scene Representation via Contrastive
Language-CAD Models Pre-training
- URL: http://arxiv.org/abs/2309.16956v1
- Date: Fri, 29 Sep 2023 03:51:26 GMT
- Title: Model2Scene: Learning 3D Scene Representation via Contrastive
Language-CAD Models Pre-training
- Authors: Runnan Chen, Xinge Zhu, Nenglun Chen, Dawei Wang, Wei Li, Yuexin Ma,
Ruigang Yang, Tongliang Liu, Wenping Wang
- Abstract summary: Current successful methods of 3D scene perception rely on the large-scale annotated point cloud.
We propose Model2Scene, a novel paradigm that learns free 3D scene representation from Computer-Aided Design (CAD) models and languages.
Model2Scene yields impressive label-free 3D object salient detection with an average mAP of 46.08% and 55.49% on the ScanNet and S3DIS datasets, respectively.
- Score: 105.3421541518582
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current successful methods of 3D scene perception rely on the large-scale
annotated point cloud, which is tedious and expensive to acquire. In this
paper, we propose Model2Scene, a novel paradigm that learns free 3D scene
representation from Computer-Aided Design (CAD) models and languages. The main
challenges are the domain gaps between the CAD models and the real scene's
objects, including model-to-scene (from a single model to the scene) and
synthetic-to-real (from synthetic model to real scene's object). To handle the
above challenges, Model2Scene first simulates a crowded scene by mixing
data-augmented CAD models. Next, we propose a novel feature regularization
operation, termed Deep Convex-hull Regularization (DCR), to project point
features into a unified convex hull space, reducing the domain gap. Ultimately,
we impose contrastive loss on language embedding and the point features of CAD
models to pre-train the 3D network. Extensive experiments verify the learned 3D
scene representation is beneficial for various downstream tasks, including
label-free 3D object salient detection, label-efficient 3D scene perception and
zero-shot 3D semantic segmentation. Notably, Model2Scene yields impressive
label-free 3D object salient detection with an average mAP of 46.08\% and
55.49\% on the ScanNet and S3DIS datasets, respectively. The code will be
publicly available.
Related papers
- DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy.
We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z) - Learning 3D Scene Priors with 2D Supervision [37.79852635415233]
We propose a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth.
Our method represents a 3D scene as a latent vector, from which we can progressively decode to a sequence of objects characterized by their class categories.
Experiments on 3D-FRONT and ScanNet show that our method outperforms state of the art in single-view reconstruction.
arXiv Detail & Related papers (2022-11-25T15:03:32Z) - Prompt-guided Scene Generation for 3D Zero-Shot Learning [8.658191774247944]
We propose a prompt-guided 3D scene generation and supervision method to augment 3D data to learn the network better.
First, we merge point clouds of two 3D models in certain ways described by a prompt. The prompt acts like the annotation describing each 3D scene.
We have achieved state-of-the-art ZSL and generalized ZSL performance on synthetic (ModelNet40, ModelNet10) and real-scanned (ScanOjbectNN) 3D object datasets.
arXiv Detail & Related papers (2022-09-29T11:24:33Z) - Towards 3D Scene Understanding by Referring Synthetic Models [65.74211112607315]
Methods typically alleviate on-extensive annotations on real scene scans.
We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space.
Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
arXiv Detail & Related papers (2022-03-20T13:06:15Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - PvDeConv: Point-Voxel Deconvolution for Autoencoding CAD Construction in
3D [23.87757211847093]
We learn to synthesize high-resolution point clouds of 10k points that densely describe the underlying geometry of Computer Aided Design (CAD) models.
We introduce a new dedicated dataset, the CC3D, containing 50k+ pairs of CAD models and their corresponding 3D meshes.
This dataset is used to learn a convolutional autoencoder for point clouds sampled from the pairs of 3D scans - CAD models.
arXiv Detail & Related papers (2021-01-12T14:14:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.