Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots
- URL: http://arxiv.org/abs/2102.04750v1
- Date: Tue, 9 Feb 2021 10:34:32 GMT
- Title: Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots
- Authors: Alexandre Almeida, Pedro Vicente, Alexandre Bernardino
- Abstract summary: We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
- Score: 129.46920552019247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to distinguish between the self and the background is of
paramount importance for robotic tasks. The particular case of hands, as the
end effectors of a robotic system that more often enter into contact with other
elements of the environment, must be perceived and tracked with precision to
execute the intended tasks with dexterity and without colliding with obstacles.
They are fundamental for several applications, from Human-Robot Interaction
tasks to object manipulation. Modern humanoid robots are characterized by high
number of degrees of freedom which makes their forward kinematics models very
sensitive to uncertainty. Thus, resorting to vision sensing can be the only
solution to endow these robots with a good perception of the self, being able
to localize their body parts with precision. In this paper, we propose the use
of a Convolution Neural Network (CNN) to segment the robot hand from an image
in an egocentric view. It is known that CNNs require a huge amount of data to
be trained. To overcome the challenge of labeling real-world images, we propose
the use of simulated datasets exploiting domain randomization techniques. We
fine-tuned the Mask-RCNN network for the specific task of segmenting the hand
of the humanoid robot Vizzy. We focus our attention on developing a methodology
that requires low amounts of data to achieve reasonable performance while
giving detailed insight on how to properly generate variability in the training
dataset. Moreover, we analyze the fine-tuning process within the complex model
of Mask-RCNN, understanding which weights should be transferred to the new task
of segmenting robot hands. Our final model was trained solely on synthetic
images and achieves an average IoU of 82% on synthetic validation data and
56.3% on real test data. These results were achieved with only 1000 training
images and 3 hours of training time using a single GPU.
Related papers
- Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets [24.77850617214567]
We propose a foundation representation learning framework capturing both visual features and the dynamics information such as actions and proprioceptions of manipulation tasks.
Specifically, we pre-train a visual encoder on the DROID robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions.
We introduce a novel contrastive loss that aligns visual observations with the robot's proprioceptive state-action dynamics, combined with a behavior cloning (BC)-like actor loss to predict actions during pre-training, along with a time contrastive loss.
arXiv Detail & Related papers (2024-10-29T17:58:13Z) - HRP: Human Affordances for Robotic Pre-Training [15.92416819748365]
We present a framework for pre-training representations on hand, object, and contact.
We experimentally demonstrate (using 3000+ robot trials) that this affordance pre-training scheme boosts performance by a minimum of 15% on 5 real-world tasks.
arXiv Detail & Related papers (2024-07-26T17:59:52Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z) - Automatically Prepare Training Data for YOLO Using Robotic In-Hand
Observation and Synthesis [14.034128227585143]
We propose combining robotic in-hand observation and data synthesis to enlarge the limited data set collected by the robot.
The collected and synthetic images are combined to train a deep detection neural network.
The results showed that combined observation and synthetic images led to comparable performance to manual data preparation.
arXiv Detail & Related papers (2023-01-04T04:20:08Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Can Foundation Models Perform Zero-Shot Task Specification For Robot
Manipulation? [54.442692221567796]
Task specification is critical for engagement of non-expert end-users and adoption of personalized robots.
A widely studied approach to task specification is through goals, using either compact state vectors or goal images from the same robot scene.
In this work, we explore alternate and more general forms of goal specification that are expected to be easier for humans to specify and use.
arXiv Detail & Related papers (2022-04-23T19:39:49Z) - Few-Shot Visual Grounding for Natural Human-Robot Interaction [0.0]
We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
arXiv Detail & Related papers (2021-03-17T15:24:02Z) - Task-relevant Representation Learning for Networked Robotic Perception [74.0215744125845]
This paper presents an algorithm to learn task-relevant representations of sensory data that are co-designed with a pre-trained robotic perception model's ultimate objective.
Our algorithm aggressively compresses robotic sensory data by up to 11x more than competing methods.
arXiv Detail & Related papers (2020-11-06T07:39:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.