Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene
Understanding
- URL: http://arxiv.org/abs/2011.02523v5
- Date: Wed, 18 Aug 2021 03:16:16 GMT
- Title: Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene
Understanding
- Authors: Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel
Angel Bautista, Nathan Paczan, Russ Webb, Joshua M. Susskind
- Abstract summary: Hypersim is a synthetic dataset for holistic indoor scene understanding.
We generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry.
- Score: 8.720130442653575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For many fundamental scene understanding tasks, it is difficult or impossible
to obtain per-pixel ground truth labels from real images. We address this
challenge by introducing Hypersim, a photorealistic synthetic dataset for
holistic indoor scene understanding. To create our dataset, we leverage a large
repository of synthetic scenes created by professional artists, and we generate
77,400 images of 461 indoor scenes with detailed per-pixel labels and
corresponding ground truth geometry. Our dataset: (1) relies exclusively on
publicly available 3D assets; (2) includes complete scene geometry, material
information, and lighting information for every scene; (3) includes dense
per-pixel semantic instance segmentations and complete camera information for
every image; and (4) factors every image into diffuse reflectance, diffuse
illumination, and a non-diffuse residual term that captures view-dependent
lighting effects.
We analyze our dataset at the level of scenes, objects, and pixels, and we
analyze costs in terms of money, computation time, and annotation effort.
Remarkably, we find that it is possible to generate our entire dataset from
scratch, for roughly half the cost of training a popular open-source natural
language processing model. We also evaluate sim-to-real transfer performance on
two real-world scene understanding tasks - semantic segmentation and 3D shape
prediction - where we find that pre-training on our dataset significantly
improves performance on both tasks, and achieves state-of-the-art performance
on the most challenging Pix3D test set. All of our rendered image data, as well
as all the code we used to generate our dataset and perform our experiments, is
available online.
Related papers
- 360 in the Wild: Dataset for Depth Prediction and View Synthesis [66.58513725342125]
We introduce a large scale 360$circ$ videos dataset in the wild.
This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide.
Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map.
arXiv Detail & Related papers (2024-06-27T05:26:38Z) - Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene
Scale and Realism Tradeoffs for ObjectGoal Navigation [70.82403156865057]
We investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects.
Our experiments show that agents trained on our smaller-scale dataset can match or outperform agents trained on much larger datasets.
arXiv Detail & Related papers (2023-06-20T05:07:23Z) - PhotoScene: Photorealistic Material and Lighting Transfer for Indoor
Scenes [84.66946637534089]
PhotoScene is a framework that takes input image(s) of a scene and builds a photorealistic digital twin with high-quality materials and similar lighting.
We model scene materials using procedural material graphs; such graphs represent photorealistic and resolution-independent materials.
We evaluate our technique on objects and layout reconstructions from ScanNet, SUN RGB-D and stock photographs, and demonstrate that our method reconstructs high-quality, fully relightable 3D scenes.
arXiv Detail & Related papers (2022-07-02T06:52:44Z) - DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort [117.41383937100751]
Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets.
We show how the GAN latent code can be decoded to produce a semantic segmentation of the image.
These generated datasets can then be used for training any computer vision architecture just as real datasets are.
arXiv Detail & Related papers (2021-04-13T20:08:29Z) - PX-NET: Simple and Efficient Pixel-Wise Training of Photometric Stereo
Networks [26.958763133729846]
Retrieving accurate 3D reconstructions of objects from the way they reflect light is a very challenging task in computer vision.
We propose a novel pixel-wise training procedure for normal prediction by replacing the training data (observation maps) of globally rendered images with independent per-pixel generated data.
Our network, PX-NET, achieves the state-of-the-art performance compared to other pixelwise methods on synthetic datasets.
arXiv Detail & Related papers (2020-08-11T18:03:13Z) - Detection and Segmentation of Custom Objects using High Distraction
Photorealistic Synthetic Data [0.5076419064097732]
We show a straightforward and useful methodology for performing instance segmentation using synthetic data.
The goal is to achieve high performance on manually-gathered and annotated real-world data of custom objects.
This white-paper provides strong evidence that photorealistic simulated data can be used in practical real world applications.
arXiv Detail & Related papers (2020-07-28T16:33:42Z) - OpenRooms: An End-to-End Open Framework for Photorealistic Indoor Scene
Datasets [103.54691385842314]
We propose a novel framework for creating large-scale photorealistic datasets of indoor scenes.
Our goal is to make the dataset creation process widely accessible.
This enables important applications in inverse rendering, scene understanding and robotics.
arXiv Detail & Related papers (2020-07-25T06:48:47Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.