Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision
Datasets from 3D Scans
- URL: http://arxiv.org/abs/2110.04994v1
- Date: Mon, 11 Oct 2021 04:21:46 GMT
- Title: Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision
Datasets from 3D Scans
- Authors: Ainaz Eftekhar, Alexander Sax, Roman Bachmann, Jitendra Malik, Amir
Zamir
- Abstract summary: This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.
Changing the sampling parameters allows one to "steer" the generated datasets to emphasize specific information.
Common architectures trained on a generated starter dataset reached state-of-the-art performance on multiple common vision tasks and benchmarks.
- Score: 103.92680099373567
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a pipeline to parametrically sample and render
multi-task vision datasets from comprehensive 3D scans from the real world.
Changing the sampling parameters allows one to "steer" the generated datasets
to emphasize specific information. In addition to enabling interesting lines of
research, we show the tooling and generated data suffice to train robust vision
models.
Common architectures trained on a generated starter dataset reached
state-of-the-art performance on multiple common vision tasks and benchmarks,
despite having seen no benchmark or non-pipeline data. The depth estimation
network outperforms MiDaS and the surface normal estimation network is the
first to achieve human-level performance for in-the-wild surface normal
estimation -- at least according to one metric on the OASIS benchmark.
The Dockerized pipeline with CLI, the (mostly python) code, PyTorch
dataloaders for the generated data, the generated starter dataset, download
scripts and other utilities are available through our project website,
https://omnidata.vision.
Related papers
- ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding [51.509115746992165]
We introduce ARKit LabelMaker, the first large-scale, real-world 3D dataset with dense semantic annotations.
We also push forward the state-of-the-art performance on ScanNet and ScanNet200 dataset with prevalent 3D semantic segmentation models.
arXiv Detail & Related papers (2024-10-17T14:44:35Z) - MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations [55.022519020409405]
This paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan.
The resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks.
arXiv Detail & Related papers (2024-06-13T17:59:30Z) - MASSTAR: A Multi-Modal and Large-Scale Scene Dataset with a Versatile Toolchain for Surface Prediction and Completion [25.44529512862336]
MASSTAR is a multi-modal lArge-scale scene dataset with a verSatile Toolchain for surfAce pRediction and completion.
We develop a versatile and efficient toolchain for processing the raw 3D data from the environments.
We generate an example dataset composed of over a thousand scene-level models with partial real-world data.
arXiv Detail & Related papers (2024-03-18T11:35:18Z) - Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous
Driving [34.368848580725576]
We develop a label generation pipeline that produces dense, visibility-aware labels for any given scene.
This pipeline comprises three stages: voxel densification, reasoning, and image-guided voxel refinement.
We propose a new model, dubbed Coarse-to-Fine Occupancy (CTF-Occ) network, which demonstrates superior performance on the Occ3D benchmarks.
arXiv Detail & Related papers (2023-04-27T17:40:08Z) - Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene
Understanding [40.68012530554327]
We introduce a pretrained 3D backbone, called SST, for 3D indoor scene understanding.
We design a 3D Swin transformer as our backbone network, which enables efficient self-attention on sparse voxels with linear memory complexity.
A series of extensive ablation studies further validate the scalability, generality, and superior performance enabled by our approach.
arXiv Detail & Related papers (2023-04-14T02:49:08Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision.
We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z) - THE Benchmark: Transferable Representation Learning for Monocular Height
Estimation [25.872962101146115]
We propose a new benchmark dataset to study the transferability of height estimation models in a cross-dataset setting.
This benchmark dataset includes a newly proposed large-scale synthetic dataset, a newly collected real-world dataset, and four existing datasets from different cities.
In this paper, we propose a scale-deformable convolution module to enhance the window-based Transformer for handling the scale-variation problem in the height estimation task.
arXiv Detail & Related papers (2021-12-30T09:40:26Z) - Open Graph Benchmark: Datasets for Machine Learning on Graphs [86.96887552203479]
We present the Open Graph Benchmark (OGB) to facilitate scalable, robust, and reproducible graph machine learning (ML) research.
OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains.
For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics.
arXiv Detail & Related papers (2020-05-02T03:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.