Bridging the Reality Gap for Pose Estimation Networks using Sensor-Based
Domain Randomization
- URL: http://arxiv.org/abs/2011.08517v3
- Date: Tue, 17 Aug 2021 09:50:01 GMT
- Title: Bridging the Reality Gap for Pose Estimation Networks using Sensor-Based
Domain Randomization
- Authors: Frederik Hagelskjaer and Anders Glent Buch
- Abstract summary: Methods trained on synthetic data use 2D images, as domain randomization in 2D is more developed.
Our method integrates the 3D data into the network to increase the accuracy of the pose estimation.
Experiments on three large pose estimation benchmarks show that the presented method outperforms previous methods trained on synthetic data.
- Score: 1.4290119665435117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the introduction of modern deep learning methods for object pose
estimation, test accuracy and efficiency has increased significantly. For
training, however, large amounts of annotated training data are required for
good performance. While the use of synthetic training data prevents the need
for manual annotation, there is currently a large performance gap between
methods trained on real and synthetic data. This paper introduces a new method,
which bridges this gap.
Most methods trained on synthetic data use 2D images, as domain randomization
in 2D is more developed. To obtain precise poses, many of these methods perform
a final refinement using 3D data. Our method integrates the 3D data into the
network to increase the accuracy of the pose estimation. To allow for domain
randomization in 3D, a sensor-based data augmentation has been developed.
Additionally, we introduce the SparseEdge feature, which uses a wider search
space during point cloud propagation to avoid relying on specific features
without increasing run-time.
Experiments on three large pose estimation benchmarks show that the presented
method outperforms previous methods trained on synthetic data and achieves
comparable results to existing methods trained on real data.
Related papers
- Volumetric Semantically Consistent 3D Panoptic Mapping [77.13446499924977]
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating semantic 3D maps suitable for autonomous agents in unstructured environments.
It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions.
The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics.
arXiv Detail & Related papers (2023-09-26T08:03:10Z) - 3D Adversarial Augmentations for Robust Out-of-Domain Predictions [115.74319739738571]
We focus on improving the generalization to out-of-domain data.
We learn a set of vectors that deform the objects in an adversarial fashion.
We perform adversarial augmentation by applying the learned sample-independent vectors to the available objects when training a model.
arXiv Detail & Related papers (2023-08-29T17:58:55Z) - A New Benchmark: On the Utility of Synthetic Data with Blender for Bare
Supervised Learning and Downstream Domain Adaptation [42.2398858786125]
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data.
The uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist.
To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization.
arXiv Detail & Related papers (2023-03-16T09:03:52Z) - Towards Deep Learning-based 6D Bin Pose Estimation in 3D Scans [0.0]
This paper focuses on a specific task of 6D pose estimation of a bin in 3D scans.
We present a high-quality dataset composed of synthetic data and real scans captured by a structured-light scanner with precise annotations.
arXiv Detail & Related papers (2021-12-17T16:19:06Z) - What Stops Learning-based 3D Registration from Working in the Real
World? [53.68326201131434]
This work identifies the sources of 3D point cloud registration failures, analyze the reasons behind them, and propose solutions.
Ultimately, this translates to a best-practice 3D registration network (BPNet), constituting the first learning-based method able to handle previously-unseen objects in real-world data.
Our model generalizes to real data without any fine-tuning, reaching an accuracy of up to 67% on point clouds of unseen objects obtained with a commercial sensor.
arXiv Detail & Related papers (2021-11-19T19:24:27Z) - 3D Annotation Of Arbitrary Objects In The Wild [0.0]
We propose a data annotation pipeline based on SLAM, 3D reconstruction, and 3D-to-2D geometry.
The pipeline allows creating 3D and 2D bounding boxes, along with per-pixel annotations of arbitrary objects.
Our results showcase almost 90% Intersection-over-Union (IoU) agreement on both semantic segmentation and 2D bounding box detection.
arXiv Detail & Related papers (2021-09-15T09:00:56Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.