Recognition of Dynamic Hand Gestures in Long Distance using a Web-Camera for Robot Guidance
- URL: http://arxiv.org/abs/2406.12424v1
- Date: Tue, 18 Jun 2024 09:17:28 GMT
- Title: Recognition of Dynamic Hand Gestures in Long Distance using a Web-Camera for Robot Guidance
- Authors: Eran Bamani Beeri, Eden Nissinman, Avishai Sintov,
- Abstract summary: We propose a model for recognizing dynamic gestures from a long distance of up to 20 meters.
The model integrates the SlowFast and Transformer architectures (SFT) to effectively process and classify complex gesture sequences captured in video frames.
- Score: 2.625826951636656
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Dynamic gestures enable the transfer of directive information to a robot. Moreover, the ability of a robot to recognize them from a long distance makes communication more effective and practical. However, current state-of-the-art models for dynamic gestures exhibit limitations in recognition distance, typically achieving effective performance only within a few meters. In this work, we propose a model for recognizing dynamic gestures from a long distance of up to 20 meters. The model integrates the SlowFast and Transformer architectures (SFT) to effectively process and classify complex gesture sequences captured in video frames. SFT demonstrates superior performance over existing models.
Related papers
- VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation [79.00294932026266]
VidMan is a novel framework that employs a two-stage training mechanism to enhance stability and improve data utilization efficiency.
Our framework outperforms state-of-the-art baseline model GR-1 on the CALVIN benchmark, achieving a 11.7% relative improvement, and demonstrates over 9% precision gains on the OXE small-scale dataset.
arXiv Detail & Related papers (2024-11-14T03:13:26Z) - Dynamic Gesture Recognition in Ultra-Range Distance for Effective Human-Robot Interaction [2.625826951636656]
This paper presents a novel approach for ultra-range gesture recognition, addressing Human-Robot Interaction (HRI) challenges over extended distances.
By leveraging human gestures in video data, we propose the Temporal-Spatiotemporal Fusion Network (TSFN) model that surpasses the limitations of current methods.
With applications in service robots, search and rescue operations, and drone-based interactions, our approach enhances HRI in expansive environments.
arXiv Detail & Related papers (2024-07-31T06:56:46Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Ultra-Range Gesture Recognition using a Web-Camera in Human-Robot Interaction [2.240453048130742]
Vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters.
We propose a novel URGR termed Graph Vision Transformer (GViT) which takes the enhanced image as input.
Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%.
arXiv Detail & Related papers (2023-11-26T17:27:26Z) - Dynamic Hand Gesture-Featured Human Motor Adaptation in Tool Delivery
using Voice Recognition [5.13619372598999]
This paper introduces an innovative human-robot collaborative framework.
It seamlessly integrates hand gesture and dynamic movement recognition, voice recognition, and a switchable control adaptation strategy.
Experiment results have demonstrated superior performance in hand gesture recognition.
arXiv Detail & Related papers (2023-09-20T14:51:09Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Snapture -- A Novel Neural Architecture for Combined Static and Dynamic
Hand Gesture Recognition [19.320551882950706]
We propose a novel hybrid hand gesture recognition system.
Our architecture enables learning both static and dynamic gestures.
Our work contributes both to gesture recognition research and machine learning applications for non-verbal communication with robots.
arXiv Detail & Related papers (2022-05-28T11:12:38Z) - Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing [14.67994875448175]
Video snapshot imaging (SCI) utilizes a 2D detector to capture sequential video frames and compress them into a single measurement.
Most existing reconstruction methods are incapable of efficiently capturing long-range spatial and temporal dependencies.
We propose a flexible and robust approach based on the graph neural network (GNN) to efficiently model non-local interactions between pixels in space and time regardless of the distance.
arXiv Detail & Related papers (2022-03-01T12:13:46Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - UniCon: Universal Neural Controller For Physics-based Character Motion [70.45421551688332]
We propose a physics-based universal neural controller (UniCon) that learns to master thousands of motions with different styles by learning on large-scale motion datasets.
UniCon can support keyboard-driven control, compose motion sequences drawn from a large pool of locomotion and acrobatics skills and teleport a person captured on video to a physics-based virtual avatar.
arXiv Detail & Related papers (2020-11-30T18:51:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.