DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems
- URL: http://arxiv.org/abs/2505.07110v1
- Date: Sun, 11 May 2025 20:35:11 GMT
- Title: DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems
- Authors: Tong Zhang, Fenghua Shao, Runsheng Zhang, Yifan Zhuang, Liuqingqing Yang,
- Abstract summary: The DeepSORT algorithm can achieve accurate target tracking in dynamic environments by combining Kalman filters and deep learning feature extraction methods.<n>This study experimentally verifies the superior performance of DeepSORT in gesture recognition and tracking.
- Score: 8.977727627113103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Based on the DeepSORT algorithm, this study explores the application of visual tracking technology in intelligent human-computer interaction, especially in the field of gesture recognition and tracking. With the rapid development of artificial intelligence and deep learning technology, visual-based interaction has gradually replaced traditional input devices and become an important way for intelligent systems to interact with users. The DeepSORT algorithm can achieve accurate target tracking in dynamic environments by combining Kalman filters and deep learning feature extraction methods. It is especially suitable for complex scenes with multi-target tracking and fast movements. This study experimentally verifies the superior performance of DeepSORT in gesture recognition and tracking. It can accurately capture and track the user's gesture trajectory and is superior to traditional tracking methods in terms of real-time and accuracy. In addition, this study also combines gesture recognition experiments to evaluate the recognition ability and feedback response of the DeepSORT algorithm under different gestures (such as sliding, clicking, and zooming). The experimental results show that DeepSORT can not only effectively deal with target occlusion and motion blur but also can stably track in a multi-target environment, achieving a smooth user interaction experience. Finally, this paper looks forward to the future development direction of intelligent human-computer interaction systems based on visual tracking and proposes future research focuses such as algorithm optimization, data fusion, and multimodal interaction in order to promote a more intelligent and personalized interactive experience. Keywords-DeepSORT, visual tracking, gesture recognition, human-computer interaction
Related papers
- Human Scanpath Prediction in Target-Present Visual Search with Semantic-Foveal Bayesian Attention [49.99728312519117]
SemBA-FAST is a top-down framework designed for predicting human visual attention in target-present visual search.<n>We evaluate SemBA-FAST on the COCO-Search18 benchmark dataset, comparing its performance against other scanpath prediction models.<n>These findings provide valuable insights into the capabilities of semantic-foveal probabilistic frameworks for human-like attention modelling.
arXiv Detail & Related papers (2025-07-24T15:19:23Z) - Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection [51.52749744031413]
Human-Object Interaction (HOI) detection aims to identify humans and objects within images and interpret their interactions.<n>Existing HOI methods rely heavily on large datasets with manual annotations to learn interactions from visual cues.<n>We propose a novel training-free HOI detection framework for Dynamic Scoring with enhanced semantics.
arXiv Detail & Related papers (2025-07-23T12:30:19Z) - Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer [21.70275919660522]
This study explores the application of natural gesture recognition based on computer vision in human-computer interaction.<n>By connecting the palm and each finger joint, a dynamic and static gesture model of the hand is formed.<n> Experimental results show that this method can effectively recognize various gestures and maintain high recognition accuracy and real-time response capabilities.
arXiv Detail & Related papers (2024-12-24T10:13:20Z) - I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data [4.487146086221174]
We present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings.
Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations.
arXiv Detail & Related papers (2024-06-10T13:08:31Z) - Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection.
We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - Investigating Deep Neural Network Architecture and Feature Extraction
Designs for Sensor-based Human Activity Recognition [0.0]
In light of deep learning's proven effectiveness across various domains, numerous deep methods have been explored to tackle the challenges in activity recognition.
We investigate the performance of common deep learning and machine learning approaches as well as different training mechanisms.
Various feature representations extracted from the sensor time-series data and measure their effectiveness for the human activity recognition task.
arXiv Detail & Related papers (2023-09-26T14:55:32Z) - A Hybrid Approach To Real-Time Multi-Object Tracking [0.0]
Multi-Object Tracking, also known as Multi-Target Tracking, is a significant area of computer vision.
This paper proposes a hybrid strategy for real-time multi-target tracking that combines effectively a classical optical flow algorithm with a deep learning architecture.
arXiv Detail & Related papers (2023-08-02T16:02:42Z) - A Real-time Human Pose Estimation Approach for Optimal Sensor Placement
in Sensor-based Human Activity Recognition [63.26015736148707]
This paper introduces a novel methodology to resolve the issue of optimal sensor placement for Human Activity Recognition.
The derived skeleton data provides a unique strategy for identifying the optimal sensor location.
Our findings indicate that the vision-based method for sensor placement offers comparable results to the conventional deep learning approach.
arXiv Detail & Related papers (2023-07-06T10:38:14Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Attention-Oriented Action Recognition for Real-Time Human-Robot
Interaction [11.285529781751984]
We propose an attention-oriented multi-level network framework to meet the need for real-time interaction.
Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution.
The other compact CNN receives the extracted skeleton sequence as input for action recognition.
arXiv Detail & Related papers (2020-07-02T12:41:28Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.