Recognition of Dynamic Hand Gestures in Long Distance using a Web-Camera for Robot Guidance
- URL: http://arxiv.org/abs/2406.12424v1
- Date: Tue, 18 Jun 2024 09:17:28 GMT
- Title: Recognition of Dynamic Hand Gestures in Long Distance using a Web-Camera for Robot Guidance
- Authors: Eran Bamani Beeri, Eden Nissinman, Avishai Sintov,
- Abstract summary: We propose a model for recognizing dynamic gestures from a long distance of up to 20 meters.
The model integrates the SlowFast and Transformer architectures (SFT) to effectively process and classify complex gesture sequences captured in video frames.
- Score: 2.625826951636656
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Dynamic gestures enable the transfer of directive information to a robot. Moreover, the ability of a robot to recognize them from a long distance makes communication more effective and practical. However, current state-of-the-art models for dynamic gestures exhibit limitations in recognition distance, typically achieving effective performance only within a few meters. In this work, we propose a model for recognizing dynamic gestures from a long distance of up to 20 meters. The model integrates the SlowFast and Transformer architectures (SFT) to effectively process and classify complex gesture sequences captured in video frames. SFT demonstrates superior performance over existing models.
Related papers
- Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches [12.221087476416056]
We introduce "motion patches", a new representation of motion sequences, and propose using Vision Transformers (ViT) as motion encoders via transfer learning.
These motion patches, created by dividing and sorting skeleton joints based on motion sequences, are robust to varying skeleton structures.
We find that transfer learning with pre-trained weights of ViT obtained through training with 2D image data can boost the performance of motion analysis.
arXiv Detail & Related papers (2024-05-08T02:42:27Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Ultra-Range Gesture Recognition using a Web-Camera in Human-Robot Interaction [2.240453048130742]
Vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters.
We propose a novel URGR termed Graph Vision Transformer (GViT) which takes the enhanced image as input.
Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%.
arXiv Detail & Related papers (2023-11-26T17:27:26Z) - Dynamic Hand Gesture-Featured Human Motor Adaptation in Tool Delivery
using Voice Recognition [5.13619372598999]
This paper introduces an innovative human-robot collaborative framework.
It seamlessly integrates hand gesture and dynamic movement recognition, voice recognition, and a switchable control adaptation strategy.
Experiment results have demonstrated superior performance in hand gesture recognition.
arXiv Detail & Related papers (2023-09-20T14:51:09Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Snapture -- A Novel Neural Architecture for Combined Static and Dynamic
Hand Gesture Recognition [19.320551882950706]
We propose a novel hybrid hand gesture recognition system.
Our architecture enables learning both static and dynamic gestures.
Our work contributes both to gesture recognition research and machine learning applications for non-verbal communication with robots.
arXiv Detail & Related papers (2022-05-28T11:12:38Z) - Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing [14.67994875448175]
Video snapshot imaging (SCI) utilizes a 2D detector to capture sequential video frames and compress them into a single measurement.
Most existing reconstruction methods are incapable of efficiently capturing long-range spatial and temporal dependencies.
We propose a flexible and robust approach based on the graph neural network (GNN) to efficiently model non-local interactions between pixels in space and time regardless of the distance.
arXiv Detail & Related papers (2022-03-01T12:13:46Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - UniCon: Universal Neural Controller For Physics-based Character Motion [70.45421551688332]
We propose a physics-based universal neural controller (UniCon) that learns to master thousands of motions with different styles by learning on large-scale motion datasets.
UniCon can support keyboard-driven control, compose motion sequences drawn from a large pool of locomotion and acrobatics skills and teleport a person captured on video to a physics-based virtual avatar.
arXiv Detail & Related papers (2020-11-30T18:51:16Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.