MINERVAS: Massive INterior EnviRonments VirtuAl Synthesis
- URL: http://arxiv.org/abs/2107.06149v2
- Date: Wed, 14 Jul 2021 14:21:45 GMT
- Title: MINERVAS: Massive INterior EnviRonments VirtuAl Synthesis
- Authors: Haocheng Ren and Hao Zhang and Jia Zheng and Jiaxiang Zheng and Rui
Tang and Rui Wang and Hujun Bao
- Abstract summary: This paper presents a Massive INterior EnviRonments VirtuAl Synthesis system to facilitate the 3D scene modification and the 2D image synthesis for various vision tasks.
We design a programmable pipeline with Domain-Specific Language, allowing users to select scenes from the commercial indoor scene database.
We demonstrate the validity and flexibility of our system by using our synthesized data to improve the performance on different kinds of computer vision tasks.
- Score: 27.816895835009994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of data-driven techniques, data has played an
essential role in various computer vision tasks. Many realistic and synthetic
datasets have been proposed to address different problems. However, there are
lots of unresolved challenges: (1) the creation of dataset is usually a tedious
process with manual annotations, (2) most datasets are only designed for a
single specific task, (3) the modification or randomization of the 3D scene is
difficult, and (4) the release of commercial 3D data may encounter copyright
issue. This paper presents MINERVAS, a Massive INterior EnviRonments VirtuAl
Synthesis system, to facilitate the 3D scene modification and the 2D image
synthesis for various vision tasks. In particular, we design a programmable
pipeline with Domain-Specific Language, allowing users to (1) select scenes
from the commercial indoor scene database, (2) synthesize scenes for different
tasks with customized rules, and (3) render various imagery data, such as
visual color, geometric structures, semantic label. Our system eases the
difficulty of customizing massive numbers of scenes for different tasks and
relieves users from manipulating fine-grained scene configurations by providing
user-controllable randomness using multi-level samplers. Most importantly, it
empowers users to access commercial scene databases with millions of indoor
scenes and protects the copyright of core data assets, e.g., 3D CAD models. We
demonstrate the validity and flexibility of our system by using our synthesized
data to improve the performance on different kinds of computer vision tasks.
Related papers
- A transition towards virtual representations of visual scenes [1.4201040196058878]
Visual scene understanding is a fundamental task in computer vision that aims to extract meaningful information from visual data.
We propose an architecture that addresses the challenges of visual scene understanding and description towards a 3D virtual synthesis.
arXiv Detail & Related papers (2024-10-10T14:41:04Z) - 3D Vision and Language Pretraining with Large-Scale Synthetic Data [28.45763758308814]
3D Vision-Language Pre-training aims to provide a pre-train model which can bridge 3D scenes with natural language.
We construct SynVL3D, a comprehensive synthetic scene-text corpus with 10K indoor scenes and 1M descriptions at object, view, and room levels.
We propose a synthetic-to-real domain adaptation in downstream task fine-tuning process to address the domain shift.
arXiv Detail & Related papers (2024-07-08T16:26:52Z) - Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a
Single Image [94.11473240505534]
We introduce HyperDreamer, a tool for creating 3D content from a single image.
It is hyper-realistic enough for post-generation usage, as users cannot view, render and edit the resulting 3D content from a full range.
We demonstrate the effectiveness of HyperDreamer in modeling region-aware materials with high-resolution textures and enabling user-friendly editing.
arXiv Detail & Related papers (2023-12-07T18:58:09Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - DORSal: Diffusion for Object-centric Representations of Scenes et al [28.181157214966493]
Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes.
We propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on frozen object-centric slot-based representations of scenes.
arXiv Detail & Related papers (2023-06-13T18:32:35Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - Equivariant Neural Rendering [22.95150913645939]
We propose a framework for learning neural scene representations directly from images, without 3D supervision.
Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene.
Our formulation allows us to infer and render scenes in real time while achieving comparable results to models requiring minutes for inference.
arXiv Detail & Related papers (2020-06-13T12:25:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.