Related papers: Convolutional Neural Network Array for Sign Language Recognition using Wearable IMUs

Convolutional Neural Network Array for Sign Language Recognition using Wearable IMUs

URL: http://arxiv.org/abs/2004.11836v1
Date: Tue, 21 Apr 2020 23:11:04 GMT
Title: Convolutional Neural Network Array for Sign Language Recognition using Wearable IMUs
Authors: Karush Suri, Rinki Gupta
Abstract summary: The proposed work presents a novel one-dimensional Convolutional Neural Network (CNN) array architecture for recognition of signs from the Indian sign language. The signals recorded using the IMU device are segregated on the basis of their context, such as whether they correspond to signing for a general sentence or an interrogative sentence.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Advancements in gesture recognition algorithms have led to a significant growth in sign language translation. By making use of efficient intelligent models, signs can be recognized with precision. The proposed work presents a novel one-dimensional Convolutional Neural Network (CNN) array architecture for recognition of signs from the Indian sign language using signals recorded from a custom designed wearable IMU device. The IMU device makes use of tri-axial accelerometer and gyroscope. The signals recorded using the IMU device are segregated on the basis of their context, such as whether they correspond to signing for a general sentence or an interrogative sentence. The array comprises of two individual CNNs, one classifying the general sentences and the other classifying the interrogative sentence. Performances of individual CNNs in the array architecture are compared to that of a conventional CNN classifying the unsegregated dataset. Peak classification accuracies of 94.20% for general sentences and 95.00% for interrogative sentences achieved with the proposed CNN array in comparison to 93.50% for conventional CNN assert the suitability of the proposed approach.

Related papers

Developing Lightweight DNN Models With Limited Data For Real-Time Sign Language Recognition [0.0]
We present a novel framework for real-time sign language recognition using lightweight DNNs trained on limited data.<n>Our system addresses key challenges in sign language recognition, including data scarcity, high computational costs, and discrepancies in frame rates between training and inference environments.
arXiv Detail & Related papers (2025-06-30T20:34:54Z)
Learning Sign Language Representation using CNN LSTM, 3DCNN, CNN RNN LSTM and CCN TD [1.2494184403263338]
The aim of this paper is to evaluate and identify the best neural network algorithm that can facilitate a sign language tuition system. The 3DCNN algorithm was found to be the best performing neural network algorithm from these systems with 91% accuracy in the TTSL dataset and 83% accuracy in the ASL dataset.
arXiv Detail & Related papers (2024-12-24T05:47:08Z)
SECNN: Squeeze-and-Excitation Convolutional Neural Network for Sentence Classification [0.0]
Convolution neural network (CNN) has the ability to extract n-grams features through convolutional filters. We propose a Squeeze-and-Excitation Convolutional neural Network (SECNN) for sentence classification.
arXiv Detail & Related papers (2023-12-11T03:26:36Z)
Attention-based Feature Compression for CNN Inference Offloading in Edge Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems. We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device. Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z)
Speaker Embedding-aware Neural Diarization: a Novel Framework for Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem. We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z)
A Novel Hand Gesture Detection and Recognition system based on ensemble-based Convolutional Neural Network [3.5665681694253903]
Detection of hand portion has become a challenging task in computer vision and pattern recognition communities. Deep learning algorithm like convolutional neural network (CNN) architecture has become a very popular choice for classification tasks. In this paper, an ensemble of CNN-based approaches is presented to overcome some problems like high variance during prediction, overfitting problem and also prediction errors.
arXiv Detail & Related papers (2022-02-25T06:46:58Z)
Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network [60.99112031408449]
We propose a real-time, single-channel attention-guided Convolutional Neural Network (CNN) to estimate the number of active speakers in overlapping speech. The proposed system extracts higher-level information from the speech spectral content using a CNN model. Experiments on simulated overlapping speech using WSJ corpus show that the attention solution is shown to improve the performance by almost 3% absolute over conventional temporal average pooling.
arXiv Detail & Related papers (2021-10-30T19:24:57Z)
EfficientTDNN: Efficient Architecture Search for Speaker Recognition in the Wild [29.59228560095565]
We propose a neural architecture search-based efficient time-delay neural network (EfficientTDNN) to improve inference efficiency while maintaining recognition accuracy. Experiments on the VoxCeleb dataset show EfficientTDNN provides a huge search space including approximately $1013$s and achieves 1.66% EER and 0.156 DCF$_0.01$ with 565M MACs.
arXiv Detail & Related papers (2021-03-25T03:28:07Z)
Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT [72.93855288283059]
We propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once) The model consists of an encoder, a decoder, and a position dependent summarizer (PDS)
arXiv Detail & Related papers (2021-02-15T15:18:59Z)
A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness. Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set. Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z)
A temporal-to-spatial deep convolutional neural network for classification of hand movements from multichannel electromyography data [0.14502611532302037]
We make the novel contribution of proposing and evaluating a design for the early processing layers in the deep CNN for multichannel sEMG. We propose a novel temporal-to-spatial (TtS) CNN architecture, where the first layer performs convolution separately on each sEMG channel to extract temporal features. We find that our novel TtS CNN design achieves 66.6% per-class accuracy on database 1, and 67.8% on database 2.
arXiv Detail & Related papers (2020-07-16T09:11:26Z)
AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech. Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times. Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.