Human keypoint detection for close proximity human-robot interaction
- URL: http://arxiv.org/abs/2207.07742v1
- Date: Fri, 15 Jul 2022 20:33:29 GMT
- Title: Human keypoint detection for close proximity human-robot interaction
- Authors: Jan Docekal, Jakub Rozlivek, Jiri Matas, and Matej Hoffmann
- Abstract summary: We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction.
The best performing whole-body keypoint detectors in close proximity were MMPose and AlphaPose, but both had difficulty with finger detection.
We propose a combination of MMPose or AlphaPose for the body and MediaPipe for the hands in a single framework providing the most accurate and robust detection.
- Score: 29.99153271571971
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We study the performance of state-of-the-art human keypoint detectors in the
context of close proximity human-robot interaction. The detection in this
scenario is specific in that only a subset of body parts such as hands and
torso are in the field of view. In particular, (i) we survey existing datasets
with human pose annotation from the perspective of close proximity images and
prepare and make publicly available a new Human in Close Proximity (HiCP)
dataset; (ii) we quantitatively and qualitatively compare state-of-the-art
human whole-body 2D keypoint detection methods (OpenPose, MMPose, AlphaPose,
Detectron2) on this dataset; (iii) since accurate detection of hands and
fingers is critical in applications with handovers, we evaluate the performance
of the MediaPipe hand detector; (iv) we deploy the algorithms on a humanoid
robot with an RGB-D camera on its head and evaluate the performance in 3D human
keypoint detection. A motion capture system is used as reference.
The best performing whole-body keypoint detectors in close proximity were
MMPose and AlphaPose, but both had difficulty with finger detection. Thus, we
propose a combination of MMPose or AlphaPose for the body and MediaPipe for the
hands in a single framework providing the most accurate and robust detection.
We also analyse the failure modes of individual detectors -- for example, to
what extent the absence of the head of the person in the image degrades
performance. Finally, we demonstrate the framework in a scenario where a
humanoid robot interacting with a person uses the detected 3D keypoints for
whole-body avoidance maneuvers.
Related papers
- AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset [52.22758311559]
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot.
The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors.
The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users.
arXiv Detail & Related papers (2024-03-21T14:53:50Z) - DECO: Dense Estimation of 3D Human-Scene Contact In The Wild [54.44345845842109]
We train a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate contact on the SMPL body.
We significantly outperform existing SOTA methods across all benchmarks.
We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images.
arXiv Detail & Related papers (2023-09-26T21:21:07Z) - HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly.
Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions.
Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z) - Detecting Human-Object Contact in Images [75.35017308643471]
Humans constantly contact objects to move and perform tasks.
There exists no robust method to detect contact between the body and the scene from an image.
We build a new dataset of human-object contacts for images.
arXiv Detail & Related papers (2023-03-06T18:56:26Z) - Robot to Human Object Handover using Vision and Joint Torque Sensor
Modalities [3.580924916641143]
The system performs a fully autonomous and robust object handover to a human receiver in real-time.
Our algorithm relies on two complementary sensor modalities: joint torque sensors on the arm and an eye-in-hand RGB-D camera for sensor feedback.
Despite substantive challenges in sensor feedback synchronization, object, and human hand detection, our system achieves robust robot-to-human handover with 98% accuracy.
arXiv Detail & Related papers (2022-10-27T00:11:34Z) - Occlusion-Robust Multi-Sensory Posture Estimation in Physical
Human-Robot Interaction [10.063075560468798]
2D postures from OpenPose over a single camera, and the trajectory of the interacting robot while the human performs a task.
We use 2D postures from OpenPose over a single camera, and the trajectory of the interacting robot while the human performs a task.
We show that our multi-sensory system resolves human kinematic redundancy better than posture estimation solely using OpenPose or posture estimation solely using the robot's trajectory.
arXiv Detail & Related papers (2022-08-12T20:41:09Z) - Gesture Recognition for Initiating Human-to-Robot Handovers [2.1614262520734595]
It is important to recognize when a human intends to initiate handovers, so that the robot does not try to take objects from humans when a handover is not intended.
We pose the handover gesture recognition as a binary classification problem in a single RGB image.
Our results show that the handover gestures are correctly identified with an accuracy of over 90%.
arXiv Detail & Related papers (2020-07-20T08:49:34Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.