Related papers: Modelling Lips-State Detection Using CNN for Non-Verbal Communications

Modelling Lips-State Detection Using CNN for Non-Verbal Communications

URL: http://arxiv.org/abs/2112.04752v2
Date: Sat, 11 Dec 2021 15:14:03 GMT
Title: Modelling Lips-State Detection Using CNN for Non-Verbal Communications
Authors: Abtahi Ishmam, Mahmudul Hasan, Md. Saif Hassan Onim, Koushik Roy, Md. Akiful Haque Akif and Hossain Nyeem
Abstract summary: This paper reports two new Conal Neural Network (CNN) models for lips state detection. We simplify lips-state model with a set of six key landmarks, and use their distances for the lips state classification. Varying frame-rates, lips-movements and face-angles are investigated to determine the effectiveness of the models.
Score: 2.0715161308249916
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-based deep learning models can be promising for speech-and-hearing-impaired and secret communications. While such non-verbal communications are primarily investigated with hand-gestures and facial expressions, no research endeavour is tracked so far for the lips state (i.e., open/close)-based interpretation/translation system. In support of this development, this paper reports two new Convolutional Neural Network (CNN) models for lips state detection. Building upon two prominent lips landmark detectors, DLIB and MediaPipe, we simplify lips-state model with a set of six key landmarks, and use their distances for the lips state classification. Thereby, both the models are developed to count the opening and closing of lips and thus, they can classify a symbol with the total count. Varying frame-rates, lips-movements and face-angles are investigated to determine the effectiveness of the models. Our early experimental results demonstrate that the model with DLIB is relatively slower in terms of an average of 6 frames per second (FPS) and higher average detection accuracy of 95.25%. In contrast, the model with MediaPipe offers faster landmark detection capability with an average FPS of 20 and detection accuracy of 94.4%. Both models thus could effectively interpret the lips state for non-verbal semantics into a natural language.

Related papers

Memory-based Language Models: An Efficient, Explainable, and Eco-friendly Approach to Large Language Modeling [0.4411777886421431]
We present memory-based language modeling as an efficient, eco-friendly alternative to deep neural network-based language modeling.<n>It offers log-linearly scalable next-token prediction performance and strong memorization capabilities.
arXiv Detail & Related papers (2025-10-25T14:34:18Z)
LASER: Lip Landmark Assisted Speaker Detection for Robustness [30.82311863795508]
We propose Lip landmark Assisted Speaker dEtection for Robustness (LASER) LASER aims to identify speaking individuals in complex visual scenes by matching lip movements to audio. Experiments show that LASER outperforms state-of-the-art models, especially in scenarios with desynchronized audio and visuals.
arXiv Detail & Related papers (2025-01-21T05:29:34Z)
Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN) [3.192629447369627]
This research combines MediaPipe and CNNs for the efficient and accurate interpretation of ASL dataset. The accuracy achieved by the model on ASL datasets is 99.12%. The system will have applications in the communication, education, and accessibility domains.
arXiv Detail & Related papers (2024-06-06T04:05:12Z)
Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations' In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z)
Learning to Decompose Visual Features with Latent Textual Prompts [140.2117637223449]
We propose Decomposed Feature Prompting (DeFo) to improve vision-language models. Our empirical study shows DeFo's significance in improving the vision-language models.
arXiv Detail & Related papers (2022-10-09T15:40:13Z)
Prediction of speech intelligibility with DNN-based performance measures [9.883633991083789]
This paper presents a speech intelligibility model based on automatic speech recognition (ASR) It combines phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models.
arXiv Detail & Related papers (2022-03-17T08:05:38Z)
Visualising and Explaining Deep Learning Models for Speech Quality Prediction [0.0]
The non-intrusive speech quality prediction model NISQA is analyzed in this paper. It is composed of a convolutional neural network (CNN) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-12-12T12:50:03Z)
Exploring Deep Learning for Joint Audio-Visual Lip Biometrics [54.32039064193566]
Audio-visual (AV) lip biometrics is a promising authentication technique that leverages the benefits of both the audio and visual modalities in speech communication. The lack of a sizeable AV database hinders the exploration of deep-learning-based audio-visual lip biometrics. We establish the DeepLip AV lip biometrics system realized with a convolutional neural network (CNN) based video module, a time-delay neural network (TDNN) based audio module, and a multimodal fusion module.
arXiv Detail & Related papers (2021-04-17T10:51:55Z)
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition. How to effectively model linguistic rules in end-to-end deep networks remains a research challenge. We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z)
Effects of Number of Filters of Convolutional Layers on Speech Recognition Model Accuracy [6.2698513174194215]
This paper studies the effects of Number of Filters of convolutional layers on the model prediction accuracy of CNN+RNN (Convolutional Networks adding to Recurrent Networks) for ASR Models (Automatic Speech Recognition) Experimental results show that only when the CNN Number of Filters exceeds a certain threshold value is adding CNN to RNN able to improve the performance of the CNN+RNN speech recognition model.
arXiv Detail & Related papers (2021-02-03T23:04:38Z)
Transformer-based Language Model Fine-tuning Methods for COVID-19 Fake News Detection [7.29381091750894]
We propose a novel transformer-based language model fine-tuning approach for these fake news detection. First, the token vocabulary of individual model is expanded for the actual semantics of professional phrases. Last, the predicted features extracted by universal language model RoBERTa and domain-specific model CT-BERT are fused by one multiple layer perception to integrate fine-grained and high-level specific representations.
arXiv Detail & Related papers (2021-01-14T09:05:42Z)
Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference [150.07326223077405]
Few-shot learning is attracting much attention to mitigate data scarcity. We present a discriminative nearest neighbor classification with deep self-attention. We propose to boost the discriminative ability by transferring a natural language inference (NLI) model.
arXiv Detail & Related papers (2020-10-25T00:39:32Z)
Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long. We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay. Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.