SIDOD: A Synthetic Image Dataset for 3D Object Pose Recognition with
Distractors
- URL: http://arxiv.org/abs/2008.05955v1
- Date: Wed, 12 Aug 2020 00:14:19 GMT
- Title: SIDOD: A Synthetic Image Dataset for 3D Object Pose Recognition with
Distractors
- Authors: Mona Jalal, Josef Spjut, Ben Boudaoud, Margrit Betke
- Abstract summary: This dataset contains 144k stereo image pairs that synthetically combine 18 camera viewpoints of three photorealistic virtual environments with up to 10 objects.
We describe our approach for domain randomization and provide insight into the decisions that produced the dataset.
- Score: 10.546457120988494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new, publicly-available image dataset generated by the NVIDIA
Deep Learning Data Synthesizer intended for use in object detection, pose
estimation, and tracking applications. This dataset contains 144k stereo image
pairs that synthetically combine 18 camera viewpoints of three photorealistic
virtual environments with up to 10 objects (chosen randomly from the 21 object
models of the YCB dataset [1]) and flying distractors. Object and camera pose,
scene lighting, and quantity of objects and distractors were randomized. Each
provided view includes RGB, depth, segmentation, and surface normal images, all
pixel level. We describe our approach for domain randomization and provide
insight into the decisions that produced the dataset.
Related papers
- 360 in the Wild: Dataset for Depth Prediction and View Synthesis [66.58513725342125]
We introduce a large scale 360$circ$ videos dataset in the wild.
This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide.
Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map.
arXiv Detail & Related papers (2024-06-27T05:26:38Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - LaTeRF: Label and Text Driven Object Radiance Fields [8.191404990730236]
We introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene and known camera poses.
To faithfully extract the object from the scene, LaTeRF extends the NeRF formulation with an additional objectness' probability at each 3D point.
We demonstrate high-fidelity object extraction on both synthetic and real datasets.
arXiv Detail & Related papers (2022-07-04T17:07:57Z) - Neural Volumetric Object Selection [126.04480613166194]
We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF)
Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views.
arXiv Detail & Related papers (2022-05-30T08:55:20Z) - A Real World Dataset for Multi-view 3D Reconstruction [28.298548207213468]
We present a dataset of 371 3D models of everyday tabletop objects along with their 320,000 real world RGB and depth images.
We primarily focus on learned multi-view 3D reconstruction due to the lack of appropriate real world benchmark for the task and demonstrate that our dataset can fill that gap.
arXiv Detail & Related papers (2022-03-22T00:15:54Z) - Multi-sensor large-scale dataset for multi-view 3D reconstruction [63.59401680137808]
We present a new multi-sensor dataset for multi-view 3D surface reconstruction.
It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner.
We provide around 1.4 million images of 107 different scenes acquired from 100 viewing directions under 14 lighting conditions.
arXiv Detail & Related papers (2022-03-11T17:32:27Z) - StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose
Estimation [43.839322860501596]
We present a large-scale stereo RGB image object pose estimation dataset named the $textbfStereOBJ-1M$ dataset.
The dataset is designed to address challenging cases such as object transparency, translucency, and specular reflection.
We propose a novel method for efficiently annotating pose data in a multi-view fashion that allows data capturing in complex and flexible environments.
arXiv Detail & Related papers (2021-09-21T11:56:38Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Sparse Pose Trajectory Completion [87.31270669154452]
We propose a method to learn, even using a dataset where objects appear only in sparsely sampled views.
This is achieved with a cross-modal pose trajectory transfer mechanism.
Our method is evaluated on the Pix3D and ShapeNet datasets.
arXiv Detail & Related papers (2021-05-01T00:07:21Z) - Learning from THEODORE: A Synthetic Omnidirectional Top-View Indoor
Dataset for Deep Transfer Learning [4.297070083645049]
We introduce THEODORE: a novel, large-scale indoor dataset containing 100,000 high-resolution diversified fisheye images with 14 classes.
We create 3D virtual environments of living rooms, different human characters and interior textures.
We show that our dataset is well suited for fine-tuning CNNs for object detection.
arXiv Detail & Related papers (2020-11-11T11:46:33Z) - YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose
Estimation [2.9972063833424216]
We present a dataset of 32 scenes that have been captured by 7 different 3D cameras, totaling 49,294 frames.
This allows evaluating the sensitivity of pose estimation algorithms to the specifics of the used camera.
arXiv Detail & Related papers (2020-04-24T11:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.