HANDS: A Multimodal Dataset for Modeling Towards Human Grasp Intent
Inference in Prosthetic Hands
- URL: http://arxiv.org/abs/2103.04845v1
- Date: Mon, 8 Mar 2021 15:51:03 GMT
- Title: HANDS: A Multimodal Dataset for Modeling Towards Human Grasp Intent
Inference in Prosthetic Hands
- Authors: Mo Han, Sezen Ya{\u{g}}mur G\"unay, Gunar Schirner, Ta\c{s}k{\i}n
Pad{\i}r, Deniz Erdo{\u{g}}mu\c{s}
- Abstract summary: Advanced prosthetic hands of the future are anticipated to benefit from improved shared control between a robotic hand and its human user.
multimodal sensor data may include various environment sensors including vision, as well as human physiology and behavior sensors.
A fusion methodology for environmental state and human intent estimation can combine these sources of evidence in order to help prosthetic hand motion planning and control.
- Score: 3.7886097009023376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Upper limb and hand functionality is critical to many activities of daily
living and the amputation of one can lead to significant functionality loss for
individuals. From this perspective, advanced prosthetic hands of the future are
anticipated to benefit from improved shared control between a robotic hand and
its human user, but more importantly from the improved capability to infer
human intent from multimodal sensor data to provide the robotic hand perception
abilities regarding the operational context. Such multimodal sensor data may
include various environment sensors including vision, as well as human
physiology and behavior sensors including electromyography and inertial
measurement units. A fusion methodology for environmental state and human
intent estimation can combine these sources of evidence in order to help
prosthetic hand motion planning and control.
In this paper, we present a dataset of this type that was gathered with the
anticipation of cameras being built into prosthetic hands, and computer vision
methods will need to assess this hand-view visual evidence in order to estimate
human intent. Specifically, paired images from human eye-view and hand-view of
various objects placed at different orientations have been captured at the
initial state of grasping trials, followed by paired video, EMG and IMU from
the arm of the human during a grasp, lift, put-down, and retract style trial
structure. For each trial, based on eye-view images of the scene showing the
hand and object on a table, multiple humans were asked to sort in decreasing
order of preference, five grasp types appropriate for the object in its given
configuration relative to the hand. The potential utility of paired eye-view
and hand-view images was illustrated by training a convolutional neural network
to process hand-view images in order to predict eye-view labels assigned by
humans.
Related papers
- Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Human keypoint detection for close proximity human-robot interaction [29.99153271571971]
We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction.
The best performing whole-body keypoint detectors in close proximity were MMPose and AlphaPose, but both had difficulty with finger detection.
We propose a combination of MMPose or AlphaPose for the body and MediaPipe for the hands in a single framework providing the most accurate and robust detection.
arXiv Detail & Related papers (2022-07-15T20:33:29Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - Towards Predicting Fine Finger Motions from Ultrasound Images via
Kinematic Representation [12.49914980193329]
We study the inference problem of identifying the activation of specific fingers from a sequence of US images.
We consider this task as an important step towards higher adoption rates of robotic prostheses among arm amputees.
arXiv Detail & Related papers (2022-02-10T18:05:09Z) - Physion: Evaluating Physical Prediction from Vision in Humans and
Machines [46.19008633309041]
We present a visual and physical prediction benchmark that precisely measures this capability.
We compare an array of algorithms on their ability to make diverse physical predictions.
We find that graph neural networks with access to the physical state best capture human behavior.
arXiv Detail & Related papers (2021-06-15T16:13:39Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - From Hand-Perspective Visual Information to Grasp Type Probabilities:
Deep Learning via Ranking Labels [6.772076545800592]
We build a novel probabilistic classifier according to the Plackett-Luce model to predict the probability distribution over grasps.
We indicate that the proposed model is applicable to the most popular and productive convolutional neural network frameworks.
arXiv Detail & Related papers (2021-03-08T16:12:38Z) - Careful with That! Observation of Human Movements to Estimate Objects
Properties [106.925705883949]
We focus on the features of human motor actions that communicate insights on the weight of an object.
Our final goal is to enable a robot to autonomously infer the degree of care required in object handling.
arXiv Detail & Related papers (2021-03-02T08:14:56Z) - Human Grasp Classification for Reactive Human-to-Robot Handovers [50.91803283297065]
We propose an approach for human-to-robot handovers in which the robot meets the human halfway.
We collect a human grasp dataset which covers typical ways of holding objects with various hand shapes and poses.
We present a planning and execution approach that takes the object from the human hand according to the detected grasp and hand position.
arXiv Detail & Related papers (2020-03-12T19:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.