Related papers: Object Detection Using Sim2Real Domain Randomization for Robotic Applications

Object Detection Using Sim2Real Domain Randomization for Robotic Applications

URL: http://arxiv.org/abs/2208.04171v1
Date: Mon, 8 Aug 2022 14:16:45 GMT
Title: Object Detection Using Sim2Real Domain Randomization for Robotic Applications
Authors: D\'aniel Horv\'ath, G\'abor Erd\H{o}s, Zolt\'an Istenes, Tom\'a\v{s} Horv\'ath, and S\'andor F\"oldi
Abstract summary: We propose a sim2real transfer learning method based on domain randomization for object detection. A state-of-the-art convolutional neural network, YOLOv4, is trained to detect the different types of industrial objects. Our solution matches industrial needs as it can reliably differentiate similar classes of objects by using only 1 real image for training.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Robots working in unstructured environments must be capable of sensing and interpreting their surroundings. One of the main obstacles of deep learning based models in the field of robotics is the lack of domain-specific labeled data for different industrial applications. In this paper, we propose a sim2real transfer learning method based on domain randomization for object detection with which labeled synthetic datasets of arbitrary size and object types can be automatically generated. Subsequently, a state-of-the-art convolutional neural network, YOLOv4, is trained to detect the different types of industrial objects. With the proposed domain randomization method, we could shrink the reality gap to a satisfactory level, achieving 86.32% and 97.38% mAP50 scores respectively in the case of zero-shot and one-shot transfers, on our manually annotated dataset containing 190 real images. On a GeForce RTX 2080 Ti GPU, the data generation process takes less than 0.5s per image and the training lasts around 12h which makes it convenient for industrial use. Our solution matches industrial needs as it can reliably differentiate similar classes of objects by using only 1 real image for training. To our best knowledge, this is the only work thus far satisfying these constraints.

Related papers

Closing the Visual Sim-to-Real Gap with Object-Composable NeRFs [59.12526668734703]
We introduce Composable Object Volume NeRF (COV-NeRF), an object-composable NeRF model that is the centerpiece of a real-to-sim pipeline. COV-NeRF extracts objects from real images and composes them into new scenes, generating photorealistic renderings and many types of 2D and 3D supervision.
arXiv Detail & Related papers (2024-03-07T00:00:02Z)
Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z)
Learning Sim-to-Real Dense Object Descriptors for Robotic Manipulation [4.7246285569677315]
We present Sim-to-Real Dense Object Nets (SRDONs), a dense object descriptor that not only understands the object via appropriate representation but also maps simulated and real data to a unified feature space with pixel consistency. We demonstrate in experiments that pre-trained SRDONs significantly improve performances on unseen objects and unseen visual environments for various robotic tasks with zero real-world training.
arXiv Detail & Related papers (2023-04-18T02:28:55Z)
Towards Precise Model-free Robotic Grasping with Sim-to-Real Transfer Learning [11.470950882435927]
We present an end-to-end robotic grasping network with a grasp. In physical robotic experiments, our grasping framework grasped single known objects and novel complex-shaped household objects with a success rate of 90.91%. The proposed grasping framework outperformed two state-of-the-art methods in both known and unknown object robotic grasping.
arXiv Detail & Related papers (2023-01-28T16:57:19Z)
Sim2real Transfer Learning for Point Cloud Segmentation: An Industrial Application Case on Autonomous Disassembly [55.41644538483948]
We present an industrial application case that uses sim2real transfer learning for point cloud data. We provide insights on how to generate and process synthetic point cloud data. A novel patch-based attention network is proposed additionally to tackle this problem.
arXiv Detail & Related papers (2023-01-12T14:00:37Z)
Neural-Sim: Learning to Generate Training Data with NeRF [31.81496344354997]
We present the first fully differentiable synthetic data pipeline that uses Neural Radiance Fields (NeRFs) in a closed-loop with a target application's loss function. Our approach generates data on-demand, with no human labor, to maximize accuracy for a target task.
arXiv Detail & Related papers (2022-07-22T22:48:33Z)
Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin-picking [98.5984733963713]
We propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. We establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data.
arXiv Detail & Related papers (2022-04-14T15:54:01Z)
Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone. Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z)
Where is my hand? Deep hand segmentation for visual self-recognition in humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view. We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z)
PennSyn2Real: Training Object Recognition Models without Human Labeling [12.923677573437699]
We propose PennSyn2Real - a synthetic dataset consisting of more than 100,000 4K images of more than 20 types of micro aerial vehicles (MAVs) The dataset can be used to generate arbitrary numbers of training images for high-level computer vision tasks such as MAV detection and classification. We show that synthetic data generated using this framework can be directly used to train CNN models for common object recognition tasks such as detection and segmentation.
arXiv Detail & Related papers (2020-09-22T02:53:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.