Automatically Prepare Training Data for YOLO Using Robotic In-Hand
Observation and Synthesis
- URL: http://arxiv.org/abs/2301.01441v1
- Date: Wed, 4 Jan 2023 04:20:08 GMT
- Title: Automatically Prepare Training Data for YOLO Using Robotic In-Hand
Observation and Synthesis
- Authors: Hao Chen, Weiwei Wan, Masaki Matsushita, Takeyuki Kotaka, Kensuke
Harada
- Abstract summary: We propose combining robotic in-hand observation and data synthesis to enlarge the limited data set collected by the robot.
The collected and synthetic images are combined to train a deep detection neural network.
The results showed that combined observation and synthetic images led to comparable performance to manual data preparation.
- Score: 14.034128227585143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning methods have recently exhibited impressive performance in
object detection. However, such methods needed much training data to achieve
high recognition accuracy, which was time-consuming and required considerable
manual work like labeling images. In this paper, we automatically prepare
training data using robots. Considering the low efficiency and high energy
consumption in robot motion, we proposed combining robotic in-hand observation
and data synthesis to enlarge the limited data set collected by the robot. We
first used a robot with a depth sensor to collect images of objects held in the
robot's hands and segment the object pictures. Then, we used a copy-paste
method to synthesize the segmented objects with rack backgrounds. The collected
and synthetic images are combined to train a deep detection neural network. We
conducted experiments to compare YOLOv5x detectors trained with images
collected using the proposed method and several other methods. The results
showed that combined observation and synthetic images led to comparable
performance to manual data preparation. They provided a good guide on
optimizing data configurations and parameter settings for training detectors.
The proposed method required only a single process and was a low-cost way to
produce the combined data. Interested readers may find the data sets and
trained models from the following GitHub repository: github.com/wrslab/tubedet
Related papers
- Redefining Data Pairing for Motion Retargeting Leveraging a Human Body Prior [4.5409191511532505]
MR HuBo(Motion Retargeting leveraging a HUman BOdy prior) is a cost-effective and convenient method to collect high-quality upper body paired robot, human> pose data.
We also present a two-stage motion neural network that can be trained via supervised learning on a large amount of paired data.
arXiv Detail & Related papers (2024-09-20T04:32:54Z) - Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition [48.65867987106428]
We introduce a novel system for joint learning between human operators and robots.
It enables human operators to share control of a robot end-effector with a learned assistive agent.
It reduces the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.
arXiv Detail & Related papers (2024-06-29T03:37:29Z) - Exploring Visual Pre-training for Robot Manipulation: Datasets, Models
and Methods [14.780597545674157]
We investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives.
We propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning.
arXiv Detail & Related papers (2023-08-07T14:24:52Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z) - PennSyn2Real: Training Object Recognition Models without Human Labeling [12.923677573437699]
We propose PennSyn2Real - a synthetic dataset consisting of more than 100,000 4K images of more than 20 types of micro aerial vehicles (MAVs)
The dataset can be used to generate arbitrary numbers of training images for high-level computer vision tasks such as MAV detection and classification.
We show that synthetic data generated using this framework can be directly used to train CNN models for common object recognition tasks such as detection and segmentation.
arXiv Detail & Related papers (2020-09-22T02:53:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.