Meta-Learning Framework for End-to-End Imposter Identification in Unseen
Speaker Recognition
- URL: http://arxiv.org/abs/2306.00952v2
- Date: Sat, 30 Sep 2023 19:35:49 GMT
- Title: Meta-Learning Framework for End-to-End Imposter Identification in Unseen
Speaker Recognition
- Authors: Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose
- Abstract summary: We show the problem of generalization using fixed thresholds (computed using EER metric) for imposter identification in unseen speaker recognition.
We then introduce a robust speaker-specific thresholding technique for better performance.
We show the efficacy of the proposed techniques on VoxCeleb1, VCTK and the FFSVC 2022 datasets, beating the baselines by up to 10%.
- Score: 4.143603294943441
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Speaker identification systems are deployed in diverse environments, often
different from the lab conditions on which they are trained and tested. In this
paper, first, we show the problem of generalization using fixed thresholds
(computed using EER metric) for imposter identification in unseen speaker
recognition and then introduce a robust speaker-specific thresholding technique
for better performance. Secondly, inspired by the recent use of meta-learning
techniques in speaker verification, we propose an end-to-end meta-learning
framework for imposter detection which decouples the problem of imposter
detection from unseen speaker identification. Thus, unlike most prior works
that use some heuristics to detect imposters, the proposed network learns to
detect imposters by leveraging the utterances of the enrolled speakers.
Furthermore, we show the efficacy of the proposed techniques on VoxCeleb1, VCTK
and the FFSVC 2022 datasets, beating the baselines by up to 10%.
Related papers
- Certification of Speaker Recognition Models to Additive Perturbations [4.332441337407564]
We pioneer applying robustness certification techniques to speaker recognition, originally developed for the image domain.
We demonstrate the effectiveness of these methods on VoxCeleb 1 and 2 datasets for several models.
We expect this work to improve voice-biometry robustness, establish a new certification benchmark, and accelerate research of certification methods in the audio domain.
arXiv Detail & Related papers (2024-04-29T15:23:26Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - In search of strong embedding extractors for speaker diarisation [49.7017388682077]
We tackle two key problems when adopting EEs for speaker diarisation.
First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.
We show that better performance on widely adopted speaker verification evaluation protocols does not lead to better diarisation performance.
We propose two data augmentation techniques to alleviate the second problem, making embedding extractors aware of overlapped speech or speaker change input.
arXiv Detail & Related papers (2022-10-26T13:00:29Z) - Improved Relation Networks for End-to-End Speaker Verification and
Identification [0.0]
Speaker identification systems are tasked to identify a speaker amongst a set of enrolled speakers given just a few samples.
We propose improved relation networks for speaker verification and few-shot (unseen) speaker identification.
Inspired by the use of prototypical networks in speaker verification, we train the model to classify samples in the current episode amongst all speakers present in the training set.
arXiv Detail & Related papers (2022-03-31T17:44:04Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - End-to-End Diarization for Variable Number of Speakers with Local-Global
Networks and Discriminative Speaker Embeddings [66.50782702086575]
We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings.
The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions.
arXiv Detail & Related papers (2021-05-05T14:55:29Z) - U-vectors: Generating clusterable speaker embedding from unlabeled data [0.0]
This paper introduces a speaker recognition strategy dealing with unlabeled data.
It generates clusterable embedding vectors from small fixed-size speech frames.
We conclude that the proposed approach achieves remarkable performance using pairwise architectures.
arXiv Detail & Related papers (2021-02-07T18:00:09Z) - Integrated Replay Spoofing-aware Text-independent Speaker Verification [47.41124427552161]
We propose two approaches for building an integrated system of speaker verification and presentation attack detection.
The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning.
We propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection.
arXiv Detail & Related papers (2020-06-10T01:24:55Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z) - Robust Speaker Recognition Using Speech Enhancement And Attention Model [37.33388614967888]
Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks.
To increase robustness against noise, a multi-stage attention mechanism is employed to highlight the speaker related features learned from context information in time and frequency domain.
The obtained results show that the proposed approach using speech enhancement and multi-stage attention models outperforms two strong baselines not using them in most acoustic conditions in our experiments.
arXiv Detail & Related papers (2020-01-14T20:03:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.