Sim2Real Docs: Domain Randomization for Documents in Natural Scenes
using Ray-traced Rendering
- URL: http://arxiv.org/abs/2112.09220v1
- Date: Thu, 16 Dec 2021 22:07:48 GMT
- Title: Sim2Real Docs: Domain Randomization for Documents in Natural Scenes
using Ray-traced Rendering
- Authors: Nikhil Maddikunta, Huijun Zhao, Sumit Keswani, Alfy Samuel, Fu-Ming
Guo, Nishan Srishankar, Vishwa Pardeshi, Austin Huang
- Abstract summary: Sim2Real Docs is a framework for datasets and performing domain randomization of documents in natural scenes.
By using rendering that simulates physical interactions of light, geometry, camera, and background, we synthesize datasets of documents in a natural scene context.
The role of machine learning models is then to solve the inverse problem posed by the rendering pipeline.
- Score: 2.8034191857296933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the past, computer vision systems for digitized documents could rely on
systematically captured, high-quality scans. Today, transactions involving
digital documents are more likely to start as mobile phone photo uploads taken
by non-professionals. As such, computer vision for document automation must now
account for documents captured in natural scene contexts. An additional
challenge is that task objectives for document processing can be highly
use-case specific, which makes publicly-available datasets limited in their
utility, while manual data labeling is also costly and poorly translates
between use cases.
To address these issues we created Sim2Real Docs - a framework for
synthesizing datasets and performing domain randomization of documents in
natural scenes. Sim2Real Docs enables programmatic 3D rendering of documents
using Blender, an open source tool for 3D modeling and ray-traced rendering. By
using rendering that simulates physical interactions of light, geometry,
camera, and background, we synthesize datasets of documents in a natural scene
context. Each render is paired with use-case specific ground truth data
specifying latent characteristics of interest, producing unlimited fit-for-task
training data. The role of machine learning models is then to solve the inverse
problem posed by the rendering pipeline. Such models can be further iterated
upon with real-world data by either fine tuning or making adjustments to domain
randomization parameters.
Related papers
- BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models.
BVS supports a large number of adjustable parameters at the scene level.
We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z) - DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering [126.00165445599764]
We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering.
Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume.
We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
arXiv Detail & Related papers (2023-07-19T17:58:03Z) - UVDoc: Neural Grid-based Document Unwarping [20.51368640747448]
Restoring the original, flat appearance of a printed document from casual photographs is a common everyday problem.
We propose a novel method for grid-based single-image document unwarping.
Our method performs geometric distortion correction via a fully convolutional deep neural network.
arXiv Detail & Related papers (2023-02-06T15:53:34Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Towards 3D Scene Understanding by Referring Synthetic Models [65.74211112607315]
Methods typically alleviate on-extensive annotations on real scene scans.
We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space.
Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
arXiv Detail & Related papers (2022-03-20T13:06:15Z) - MINERVAS: Massive INterior EnviRonments VirtuAl Synthesis [27.816895835009994]
This paper presents a Massive INterior EnviRonments VirtuAl Synthesis system to facilitate the 3D scene modification and the 2D image synthesis for various vision tasks.
We design a programmable pipeline with Domain-Specific Language, allowing users to select scenes from the commercial indoor scene database.
We demonstrate the validity and flexibility of our system by using our synthesized data to improve the performance on different kinds of computer vision tasks.
arXiv Detail & Related papers (2021-07-13T14:53:01Z) - NViSII: A Scriptable Tool for Photorealistic Image Generation [21.453677837017462]
We present a Python-based built on NVIDIA's OptiX ray tracing engine and the OptiX AI denoiser, designed to generate high-quality synthetic images.
Our tool enables the description and manipulation of complex dynamic 3D scenes.
arXiv Detail & Related papers (2021-05-28T16:35:32Z) - UnrealROX+: An Improved Tool for Acquiring Synthetic Data from Virtual
3D Environments [14.453602631430508]
We present an improved version of UnrealROX, a tool to generate synthetic data from robotic images.
Un UnrealROX+ includes new features such as generating albedo or a Python API for interacting with the virtual environment from Deep Learning frameworks.
arXiv Detail & Related papers (2021-04-23T18:45:42Z) - Generating Synthetic Handwritten Historical Documents With OCR
Constrained GANs [2.3808546906079178]
We present a framework to generate synthetic historical documents with precise ground truth using nothing more than a collection of unlabeled historical images.
We demonstrate a high-quality synthesis that makes it possible to generate large labeled historical document datasets with precise ground truth.
arXiv Detail & Related papers (2021-03-15T09:39:17Z) - OpenRooms: An End-to-End Open Framework for Photorealistic Indoor Scene
Datasets [103.54691385842314]
We propose a novel framework for creating large-scale photorealistic datasets of indoor scenes.
Our goal is to make the dataset creation process widely accessible.
This enables important applications in inverse rendering, scene understanding and robotics.
arXiv Detail & Related papers (2020-07-25T06:48:47Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.