Towards 3D Scene Understanding by Referring Synthetic Models
- URL: http://arxiv.org/abs/2203.10546v1
- Date: Sun, 20 Mar 2022 13:06:15 GMT
- Title: Towards 3D Scene Understanding by Referring Synthetic Models
- Authors: Runnan Chen, Xinge Zhu, Nenglun Chen, Dawei Wang, Wei Li, Yuexin Ma,
Ruigang Yang, Wenping Wang
- Abstract summary: Methods typically alleviate on-extensive annotations on real scene scans.
We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space.
Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
- Score: 65.74211112607315
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Promising performance has been achieved for visual perception on the point
cloud. However, the current methods typically rely on labour-extensive
annotations on the scene scans. In this paper, we explore how synthetic models
alleviate the real scene annotation burden, i.e., taking the labelled 3D
synthetic models as reference for supervision, the neural network aims to
recognize specific categories of objects on a real scene scan (without scene
annotation for supervision). The problem studies how to transfer knowledge from
synthetic 3D models to real 3D scenes and is named Referring Transfer Learning
(RTL). The main challenge is solving the model-to-scene (from a single model to
the scene) and synthetic-to-real (from synthetic model to real scene's object)
gap between the synthetic model and the real scene. To this end, we propose a
simple yet effective framework to perform two alignment operations. First,
physical data alignment aims to make the synthetic models cover the diversity
of the scene's objects with data processing techniques. Then a novel
\textbf{convex-hull regularized feature alignment} introduces learnable
prototypes to project the point features of both synthetic models and real
scenes to a unified feature space, which alleviates the domain gap. These
operations ease the model-to-scene and synthetic-to-real difficulty for a
network to recognize the target objects on a real unseen scene. Experiments
show that our method achieves the average mAP of 46.08\% and 55.49\% on the
ScanNet and S3DIS datasets by learning the synthetic models from the ModelNet
dataset. Code will be publicly available.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - Model2Scene: Learning 3D Scene Representation via Contrastive
Language-CAD Models Pre-training [105.3421541518582]
Current successful methods of 3D scene perception rely on the large-scale annotated point cloud.
We propose Model2Scene, a novel paradigm that learns free 3D scene representation from Computer-Aided Design (CAD) models and languages.
Model2Scene yields impressive label-free 3D object salient detection with an average mAP of 46.08% and 55.49% on the ScanNet and S3DIS datasets, respectively.
arXiv Detail & Related papers (2023-09-29T03:51:26Z) - Robust Category-Level 3D Pose Estimation from Synthetic Data [17.247607850702558]
We introduce SyntheticP3D, a new synthetic dataset for object pose estimation generated from CAD models.
We propose a novel approach (CC3D) for training neural mesh models that perform pose estimation via inverse rendering.
arXiv Detail & Related papers (2023-05-25T14:56:03Z) - Control-NeRF: Editable Feature Volumes for Scene Rendering and
Manipulation [58.16911861917018]
We present a novel method for performing flexible, 3D-aware image content manipulation while enabling high-quality novel view synthesis.
Our model couples learnt scene-specific feature volumes with a scene agnostic neural rendering network.
We demonstrate various scene manipulations, including mixing scenes, deforming objects and inserting objects into scenes, while still producing photo-realistic results.
arXiv Detail & Related papers (2022-04-22T17:57:00Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.