Related papers: Two-step Authentication: Multi-biometric System Using Voice and Facial Recognition

Two-step Authentication: Multi-biometric System Using Voice and Facial Recognition

URL: http://arxiv.org/abs/2601.06218v1
Date: Fri, 09 Jan 2026 02:11:50 GMT
Title: Two-step Authentication: Multi-biometric System Using Voice and Facial Recognition
Authors: Kuan Wei Chen, Ting Yi Lin, Wen Ren Yang, Aryan Kesarwani, Riya Singh,
Abstract summary: We present a cost-effective two-step authentication system that integrates face identification and speaker verification using only a camera and microphone available on common devices.<n>For face recognition, a pruned VGG-16 based classifier is trained on an augmented dataset of 924 images from five subjects, with faces localized by MTCNN.<n>For voice recognition, a CNN speaker-verification model trained on LibriSpeech attains 98.9% accuracy and 3.456% EER on test-clean.
Score: 0.4077787659104315
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a cost-effective two-step authentication system that integrates face identification and speaker verification using only a camera and microphone available on common devices. The pipeline first performs face recognition to identify a candidate user from a small enrolled group, then performs voice recognition only against the matched identity to reduce computation and improve robustness. For face recognition, a pruned VGG-16 based classifier is trained on an augmented dataset of 924 images from five subjects, with faces localized by MTCNN; it achieves 95.1% accuracy. For voice recognition, a CNN speaker-verification model trained on LibriSpeech (train-other-360) attains 98.9% accuracy and 3.456% EER on test-clean. Source code and trained models are available at https://github.com/NCUE-EE-AIAL/Two-step-Authentication-Multi-biometric-System.

Related papers

Robust Persian Digit Recognition in Noisy Environments Using Hybrid CNN-BiGRU Model [1.5566524830295307]
This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions.<n>A hybrid model combining residual convolutional neural networks and bidirectional gated units (BiGRU) is proposed.<n> Experimental results demonstrate the model's effectiveness, achieving 98.53%, 96.10%, and 95.92% accuracy on training, validation, and test sets.
arXiv Detail & Related papers (2024-12-14T15:11:42Z)
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer [59.57249127943914]
We present a multilingual Audio-Visual Speech Recognition model incorporating several enhancements to improve performance and audio noise robustness. We increase the amount of audio-visual training data for six distinct languages, generating automatic transcriptions of unlabelled multilingual datasets. Our proposed model achieves new state-of-the-art performance on the LRS3 dataset, reaching WER of 0.8%.
arXiv Detail & Related papers (2024-03-14T01:16:32Z)
IdentiFace : A VGG Based Multimodal Facial Biometric System [0.0]
"IdentiFace" is a multimodal facial biometric system that combines the core of facial recognition with some of the most important soft biometric traits such as gender, face shape, and emotion. For the recognition problem, we acquired a 99.2% test accuracy for five classes with high intra-class variations using data collected from the FERET database. We were also able to achieve a testing accuracy of 88.03% in the face-shape problem using the celebrity face-shape dataset.
arXiv Detail & Related papers (2024-01-02T14:36:28Z)
Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition [0.0]
We present an alternate authentication process that makes use of both facial recognition and the individual's distinctive temporal facial feature motions while they speak a password. The suggested model attained an accuracy of 96.1% when tested on the industry-standard MIRACL-VC1 dataset.
arXiv Detail & Related papers (2023-04-30T18:52:29Z)
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition. We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement. Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z)
Exploring Deep Learning for Joint Audio-Visual Lip Biometrics [54.32039064193566]
Audio-visual (AV) lip biometrics is a promising authentication technique that leverages the benefits of both the audio and visual modalities in speech communication. The lack of a sizeable AV database hinders the exploration of deep-learning-based audio-visual lip biometrics. We establish the DeepLip AV lip biometrics system realized with a convolutional neural network (CNN) based video module, a time-delay neural network (TDNN) based audio module, and a multimodal fusion module.
arXiv Detail & Related papers (2021-04-17T10:51:55Z)
An Improved Real-Time Face Recognition System at Low Resolution Based on Local Binary Pattern Histogram Algorithm and CLAHE [0.0]
This research presents an improved real-time face recognition system at a low resolution of 15 pixels with pose and emotion and resolution variations. We have designed our datasets named LRD200 and LRD100, which have been used for training and classification. This face recognition system can be employed for law enforcement purposes, where the surveillance camera captures a low-resolution image because of the distance of a person from the camera.
arXiv Detail & Related papers (2021-04-15T04:54:29Z)
Real Time Face Recognition Using Convoluted Neural Networks [0.0]
Convolutional Neural Networks are proved to be best for facial recognition. The creation of dataset is done by converting face videos of the persons to be recognized into hundreds of images of person.
arXiv Detail & Related papers (2020-10-09T12:04:49Z)
Few Shot Text-Independent speaker verification using 3D-CNN [0.0]
We have proposed a novel method to verify the identity of the claimed speaker using very few training data. Experiments conducted on the VoxCeleb1 dataset show that the proposed model accuracy even on training with very few data is near to the state of the art model on text-independent speaker verification.
arXiv Detail & Related papers (2020-08-25T15:03:29Z)
Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR [91.87500543591945]
We develop an end-to-end multi-talker automatic speech recognition system for an unknown number of active speakers. Our experiments show very promising performance in counting accuracy, source separation and speech recognition. Our system generalizes well to a larger number of speakers than it ever saw during training.
arXiv Detail & Related papers (2020-06-04T11:25:50Z)
AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech. Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times. Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z)
Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification [93.5538147928669]
We audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model's training data and an accuracy of 75.73% for those not present.
arXiv Detail & Related papers (2020-01-09T15:50:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.