Dynamic Gesture Recognition in Ultra-Range Distance for Effective Human-Robot Interaction
- URL: http://arxiv.org/abs/2407.21374v1
- Date: Wed, 31 Jul 2024 06:56:46 GMT
- Title: Dynamic Gesture Recognition in Ultra-Range Distance for Effective Human-Robot Interaction
- Authors: Eran Bamani Beeri, Eden Nissinman, Avishai Sintov,
- Abstract summary: This paper presents a novel approach for ultra-range gesture recognition, addressing Human-Robot Interaction (HRI) challenges over extended distances.
By leveraging human gestures in video data, we propose the Temporal-Spatiotemporal Fusion Network (TSFN) model that surpasses the limitations of current methods.
With applications in service robots, search and rescue operations, and drone-based interactions, our approach enhances HRI in expansive environments.
- Score: 2.625826951636656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel approach for ultra-range gesture recognition, addressing Human-Robot Interaction (HRI) challenges over extended distances. By leveraging human gestures in video data, we propose the Temporal-Spatiotemporal Fusion Network (TSFN) model that surpasses the limitations of current methods, enabling robots to understand gestures from long distances. With applications in service robots, search and rescue operations, and drone-based interactions, our approach enhances HRI in expansive environments. Experimental validation demonstrates significant advancements in gesture recognition accuracy, particularly in prolonged gesture sequences.
Related papers
- Recognition of Dynamic Hand Gestures in Long Distance using a Web-Camera for Robot Guidance [2.625826951636656]
We propose a model for recognizing dynamic gestures from a long distance of up to 20 meters.
The model integrates the SlowFast and Transformer architectures (SFT) to effectively process and classify complex gesture sequences captured in video frames.
arXiv Detail & Related papers (2024-06-18T09:17:28Z) - NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot
Learning in Natural Human-Robot Interaction [19.65778558341053]
Speech-gesture HRI datasets often focus on elementary tasks, like object pointing and pushing.
We introduce NatSGD, a multimodal HRI dataset encompassing human commands through speech and gestures.
We demonstrate its effectiveness in training robots to understand tasks through multimodal human commands.
arXiv Detail & Related papers (2024-03-04T18:02:41Z) - Ultra-Range Gesture Recognition using a Web-Camera in Human-Robot Interaction [2.240453048130742]
Vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters.
We propose a novel URGR termed Graph Vision Transformer (GViT) which takes the enhanced image as input.
Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%.
arXiv Detail & Related papers (2023-11-26T17:27:26Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - MILD: Multimodal Interactive Latent Dynamics for Learning Human-Robot
Interaction [34.978017200500005]
We propose Multimodal Interactive Latent Dynamics (MILD) to address the problem of two-party physical Human-Robot Interactions (HRIs)
We learn the interaction dynamics from demonstrations, using Hidden Semi-Markov Models (HSMMs) to model the joint distribution of the interacting agents in the latent space of a Variational Autoencoder (VAE)
MILD generates more accurate trajectories for the controlled agent (robot) when conditioned on the observed agent's (human) trajectory.
arXiv Detail & Related papers (2022-10-22T11:25:11Z) - Next Steps: Learning a Disentangled Gait Representation for Versatile
Quadruped Locomotion [69.87112582900363]
Current planners are unable to vary key gait parameters continuously while the robot is in motion.
In this work we address this limitation by learning a latent space capturing the key stance phases constituting a particular gait.
We demonstrate that specific properties of the drive signal map directly to gait parameters such as cadence, foot step height and full stance duration.
arXiv Detail & Related papers (2021-12-09T10:02:02Z) - Show Me What You Can Do: Capability Calibration on Reachable Workspace
for Human-Robot Collaboration [83.4081612443128]
We show that a short calibration using REMP can effectively bridge the gap between what a non-expert user thinks a robot can reach and the ground-truth.
We show that this calibration procedure not only results in better user perception, but also promotes more efficient human-robot collaborations.
arXiv Detail & Related papers (2021-03-06T09:14:30Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Attention-Oriented Action Recognition for Real-Time Human-Robot
Interaction [11.285529781751984]
We propose an attention-oriented multi-level network framework to meet the need for real-time interaction.
Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution.
The other compact CNN receives the extracted skeleton sequence as input for action recognition.
arXiv Detail & Related papers (2020-07-02T12:41:28Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.