Text Independent Speaker Identification System for Access Control
- URL: http://arxiv.org/abs/2209.14335v1
- Date: Mon, 26 Sep 2022 14:42:18 GMT
- Title: Text Independent Speaker Identification System for Access Control
- Authors: Oluyemi E. Adetoyi
- Abstract summary: Even human intelligence system fails to offer 100% accuracy in identifying speeches from a specific individual.
This paper presents a text-independent speaker identification system that employs Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and k-Nearest Neighbor (kNN) for classification.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even human intelligence system fails to offer 100% accuracy in identifying
speeches from a specific individual. Machine intelligence is trying to mimic
humans in speaker identification problems through various approaches to speech
feature extraction and speech modeling techniques. This paper presents a
text-independent speaker identification system that employs Mel Frequency
Cepstral Coefficients (MFCC) for feature extraction and k-Nearest Neighbor
(kNN) for classification. The maximum cross-validation accuracy obtained was
60%. This will be improved upon in subsequent research.
Related papers
- Investigating Confidence Estimation Measures for Speaker Diarization [4.679826697518427]
Speaker diarization systems segment a conversation recording based on the speakers' identity.
Speaker diarization errors propagate to, and can adversely affect, downstream systems that rely on the speaker's identity.
One way to mitigate these errors is to provide segment-level diarization confidence scores to downstream systems.
arXiv Detail & Related papers (2024-06-24T20:21:38Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - EmoDiarize: Speaker Diarization and Emotion Identification from Speech
Signals using Convolutional Neural Networks [0.0]
This research explores the integration of deep learning techniques in speech emotion recognition.
It introduces a framework that combines a pre-existing speaker diarization pipeline and an emotion identification model built on a Convolutional Neural Network (CNN)
The proposed model yields an unweighted accuracy of 63%, demonstrating remarkable efficiency in accurately identifying emotional states within speech signals.
arXiv Detail & Related papers (2023-10-19T16:02:53Z) - Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment [1.0359008237358598]
Dysarthria is a disability that causes a disturbance in the human speech system.
We introduce gammatonegram as an effective method to represent audio files with discriminative details.
We convert each speech file into an image and propose image recognition system to classify speech in different scenarios.
arXiv Detail & Related papers (2023-07-06T21:10:50Z) - Improving speaker de-identification with functional data analysis of f0
trajectories [10.809893662563926]
Formant modification is a simpler, yet effective method for speaker de-identification which requires no training data.
This study introduces a novel speaker de-identification method, which, in addition to simple formant shifts, manipulates f0 trajectories based on functional data analysis.
The proposed speaker de-identification method will conceal plausibly identifying pitch characteristics in a phonetically controllable manner and improve formant-based speaker de-identification up to 25%.
arXiv Detail & Related papers (2022-03-31T01:34:15Z) - Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring [60.55025339250815]
We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling.
We take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context from these responses and feed them as additional speaker-specific context to our network to score a particular response.
arXiv Detail & Related papers (2021-08-30T07:00:28Z) - FoolHD: Fooling speaker identification by Highly imperceptible
adversarial Disturbances [63.80959552818541]
We propose a white-box steganography-inspired adversarial attack that generates imperceptible perturbations against a speaker identification model.
Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function.
We validate FoolHD with a 250-speaker identification x-vector network, trained using VoxCeleb, in terms of accuracy, success rate, and imperceptibility.
arXiv Detail & Related papers (2020-11-17T07:38:26Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - A Lightweight Speaker Recognition System Using Timbre Properties [0.5708902722746041]
We propose a lightweight text-independent speaker recognition model based on random forest classifier.
It also introduces new features that are used for both speaker verification and identification tasks.
The prototype uses seven most actively searched properties, boominess, brightness, depth, hardness, timbre, sharpness, and warmth.
arXiv Detail & Related papers (2020-10-12T07:56:03Z) - Multi-talker ASR for an unknown number of sources: Joint training of
source counting, separation and ASR [91.87500543591945]
We develop an end-to-end multi-talker automatic speech recognition system for an unknown number of active speakers.
Our experiments show very promising performance in counting accuracy, source separation and speech recognition.
Our system generalizes well to a larger number of speakers than it ever saw during training.
arXiv Detail & Related papers (2020-06-04T11:25:50Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.