Deep Learning-based Spatio Temporal Facial Feature Visual Speech
Recognition
- URL: http://arxiv.org/abs/2305.00552v1
- Date: Sun, 30 Apr 2023 18:52:29 GMT
- Title: Deep Learning-based Spatio Temporal Facial Feature Visual Speech
Recognition
- Authors: Pangoth Santhosh Kumar, Garika Akshay
- Abstract summary: We present an alternate authentication process that makes use of both facial recognition and the individual's distinctive temporal facial feature motions while they speak a password.
The suggested model attained an accuracy of 96.1% when tested on the industry-standard MIRACL-VC1 dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In low-resource computing contexts, such as smartphones and other tiny
devices, Both deep learning and machine learning are being used in a lot of
identification systems. as authentication techniques. The transparent,
contactless, and non-invasive nature of these face recognition technologies
driven by AI has led to their meteoric rise in popularity in recent years.
While they are mostly successful, there are still methods to get inside without
permission by utilising things like pictures, masks, glasses, etc. In this
research, we present an alternate authentication process that makes use of both
facial recognition and the individual's distinctive temporal facial feature
motions while they speak a password. Because the suggested methodology allows
for a password to be specified in any language, it is not limited by language.
The suggested model attained an accuracy of 96.1% when tested on the
industry-standard MIRACL-VC1 dataset, demonstrating its efficacy as a reliable
and powerful solution. In addition to being data-efficient, the suggested
technique shows promising outcomes with as little as 10 positive video examples
for training the model. The effectiveness of the network's training is further
proved via comparisons with other combined facial recognition and lip reading
models.
Related papers
- Detecting Generated Images by Real Images Only [64.12501227493765]
Existing generated image detection methods detect visual artifacts in generated images or learn discriminative features from both real and generated images by massive training.
This paper approaches the generated image detection problem from a new perspective: Start from real images.
By finding the commonality of real images and mapping them to a dense subspace in feature space, the goal is that generated images, regardless of their generative model, are then projected outside the subspace.
arXiv Detail & Related papers (2023-11-02T03:09:37Z) - Self-Evolution Learning for Discriminative Language Model Pretraining [103.57103957631067]
Self-Evolution learning (SE) is a simple and effective token masking and learning method.
SE focuses on learning the informative yet under-explored tokens and adaptively regularizes the training by introducing a novel Token-specific Label Smoothing approach.
arXiv Detail & Related papers (2023-05-24T16:00:54Z) - How to Boost Face Recognition with StyleGAN? [13.067766076889995]
State-of-the-art face recognition systems require vast amounts of labeled training data.
Self-supervised revolution in the industry motivates research on the adaptation of related techniques to facial recognition.
We show that a simple approach based on fine-tuning pSp encoder for StyleGAN allows us to improve upon the state-of-the-art facial recognition.
arXiv Detail & Related papers (2022-10-18T18:41:56Z) - Reading and Writing: Discriminative and Generative Modeling for
Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images.
Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets.
Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - AuthNet: A Deep Learning based Authentication Mechanism using Temporal
Facial Feature Movements [0.0]
We propose an alternative authentication mechanism that uses both facial recognition and the unique movements of that particular face while uttering a password.
The proposed model is not inhibited by language barriers because a user can set a password in any language.
arXiv Detail & Related papers (2020-12-04T10:46:12Z) - Real Time Face Recognition Using Convoluted Neural Networks [0.0]
Convolutional Neural Networks are proved to be best for facial recognition.
The creation of dataset is done by converting face videos of the persons to be recognized into hundreds of images of person.
arXiv Detail & Related papers (2020-10-09T12:04:49Z) - An adversarial learning framework for preserving users' anonymity in
face-based emotion recognition [6.9581841997309475]
This paper proposes an adversarial learning framework which relies on a convolutional neural network (CNN) architecture trained through an iterative procedure.
Results indicate that the proposed approach can learn a convolutional transformation for preserving emotion recognition accuracy and degrading face identity recognition.
arXiv Detail & Related papers (2020-01-16T22:45:52Z) - Investigating the Impact of Inclusion in Face Recognition Training Data
on Individual Face Identification [93.5538147928669]
We audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images.
We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model's training data and an accuracy of 75.73% for those not present.
arXiv Detail & Related papers (2020-01-09T15:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.