Dynamic Gesture Recognition
- URL: http://arxiv.org/abs/2109.09396v2
- Date: Wed, 22 Sep 2021 21:06:25 GMT
- Title: Dynamic Gesture Recognition
- Authors: Jonas Bokstaller and Costanza Maria Improta
- Abstract summary: It is possible to use machine learning to classify images and/or videos instead of the traditional computer vision algorithms.
The aim of this project is to builda symbiosis between a convolutional neural network (CNN) and a recurrent neural network (RNN)
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Human-Machine Interaction (HMI) research field is an important topic in
machine learning that has been deeply investigated thanks to the rise of
computing power in the last years. The first time, it is possible to use
machine learning to classify images and/or videos instead of the traditional
computer vision algorithms. The aim of this project is to builda symbiosis
between a convolutional neural network (CNN)[1] and a recurrent neural network
(RNN) [2] to recognize cultural/anthropological Italian sign language gestures
from videos. The CNN extracts important features that later areused by the RNN.
With RNNs we are able to store temporal information inside the model to provide
contextual information from previous frames to enhance the prediction accuracy.
Our novel approach uses different data augmentation techniquesand
regularization methods from only RGB frames to avoid overfitting and provide a
small generalization error.
Related papers
- Deep Neural Networks in Video Human Action Recognition: A Review [21.00217656391331]
Video behavior recognition is one of the most foundational tasks of computer vision.
Deep neural networks are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats.
In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks.
arXiv Detail & Related papers (2023-05-25T03:54:41Z) - Comparison Analysis of Traditional Machine Learning and Deep Learning
Techniques for Data and Image Classification [62.997667081978825]
The purpose of the study is to analyse and compare the most common machine learning and deep learning techniques used for computer vision 2D object classification tasks.
Firstly, we will present the theoretical background of the Bag of Visual words model and Deep Convolutional Neural Networks (DCNN)
Secondly, we will implement a Bag of Visual Words model, the VGG16 CNN Architecture.
arXiv Detail & Related papers (2022-04-11T11:34:43Z) - Predictive Coding: Towards a Future of Deep Learning beyond
Backpropagation? [41.58529335439799]
The backpropagation of error algorithm used to train deep neural networks has been fundamental to the successes of deep learning.
Recent work has developed the idea into a general-purpose algorithm able to train neural networks using only local computations.
We show the substantially greater flexibility of predictive coding networks against equivalent deep neural networks.
arXiv Detail & Related papers (2022-02-18T22:57:03Z) - Visualising and Explaining Deep Learning Models for Speech Quality
Prediction [0.0]
The non-intrusive speech quality prediction model NISQA is analyzed in this paper.
It is composed of a convolutional neural network (CNN) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-12-12T12:50:03Z) - CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded
Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor.
In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Overcoming Catastrophic Forgetting in Graph Neural Networks [50.900153089330175]
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks.
We propose a novel scheme dedicated to overcoming this problem and hence strengthen continual learning in graph neural networks (GNNs)
At the heart of our approach is a generic module, termed as topology-aware weight preserving(TWP)
arXiv Detail & Related papers (2020-12-10T22:30:25Z) - A Practical Tutorial on Graph Neural Networks [49.919443059032226]
Graph neural networks (GNNs) have recently grown in popularity in the field of artificial intelligence (AI)
This tutorial exposes the power and novelty of GNNs to AI practitioners.
arXiv Detail & Related papers (2020-10-11T12:36:17Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - On the Effectiveness of Neural Text Generation based Data Augmentation
for Recognition of Morphologically Rich Speech [0.0]
We have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a RNNLM to the single pass BNLM with text generation based data augmentation.
We show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.
arXiv Detail & Related papers (2020-06-09T09:01:04Z) - Visual Commonsense R-CNN [102.5061122013483]
We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN)
VC R-CNN serves as an improved visual region encoder for high-level tasks such as captioning and VQA.
We extensively apply VC R-CNN features in prevailing models of three popular tasks: Image Captioning, VQA, and VCR, and observe consistent performance boosts across them.
arXiv Detail & Related papers (2020-02-27T15:51:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.