Integrated Replay Spoofing-aware Text-independent Speaker Verification
- URL: http://arxiv.org/abs/2006.05599v2
- Date: Sun, 27 Sep 2020 10:28:08 GMT
- Title: Integrated Replay Spoofing-aware Text-independent Speaker Verification
- Authors: Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Seung-bin Kim, Ha-Jin Yu
- Abstract summary: We propose two approaches for building an integrated system of speaker verification and presentation attack detection.
The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning.
We propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection.
- Score: 47.41124427552161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A number of studies have successfully developed speaker verification or
presentation attack detection systems. However, studies integrating the two
tasks remain in the preliminary stages. In this paper, we propose two
approaches for building an integrated system of speaker verification and
presentation attack detection: an end-to-end monolithic approach and a back-end
modular approach. The first approach simultaneously trains speaker
identification, presentation attack detection, and the integrated system using
multi-task learning using a common feature. However, through experiments, we
hypothesize that the information required for performing speaker verification
and presentation attack detection might differ because speaker verification
systems try to remove device-specific information from speaker embeddings,
while presentation attack detection systems exploit such information.
Therefore, we propose a back-end modular approach using a separate deep neural
network (DNN) for speaker verification and presentation attack detection. This
approach has thee input components: two speaker embeddings (for enrollment and
test each) and prediction of presentation attacks. Experiments are conducted
using the ASVspoof 2017-v2 dataset, which includes official trials on the
integration of speaker verification and presentation attack detection. The
proposed back-end approach demonstrates a relative improvement of 21.77% in
terms of the equal error rate for integrated trials compared to a conventional
speaker verification system.
Related papers
- Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Meta-Learning Framework for End-to-End Imposter Identification in Unseen
Speaker Recognition [4.143603294943441]
We show the problem of generalization using fixed thresholds (computed using EER metric) for imposter identification in unseen speaker recognition.
We then introduce a robust speaker-specific thresholding technique for better performance.
We show the efficacy of the proposed techniques on VoxCeleb1, VCTK and the FFSVC 2022 datasets, beating the baselines by up to 10%.
arXiv Detail & Related papers (2023-06-01T17:49:58Z) - Exploring Speaker-Related Information in Spoken Language Understanding
for Better Speaker Diarization [7.673971221635779]
We propose methods to extract speaker-related information from semantic content in multi-party meetings.
Experiments on both AISHELL-4 and AliMeeting datasets show that our method achieves consistent improvements over acoustic-only speaker diarization systems.
arXiv Detail & Related papers (2023-05-22T11:14:19Z) - Symmetric Saliency-based Adversarial Attack To Speaker Identification [17.087523686496958]
We propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED)
First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system.
Second, it proposes an angular loss function to push the speaker embedding far away from the source speaker.
arXiv Detail & Related papers (2022-10-30T08:54:02Z) - In search of strong embedding extractors for speaker diarisation [49.7017388682077]
We tackle two key problems when adopting EEs for speaker diarisation.
First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.
We show that better performance on widely adopted speaker verification evaluation protocols does not lead to better diarisation performance.
We propose two data augmentation techniques to alleviate the second problem, making embedding extractors aware of overlapped speech or speaker change input.
arXiv Detail & Related papers (2022-10-26T13:00:29Z) - Improved Relation Networks for End-to-End Speaker Verification and
Identification [0.0]
Speaker identification systems are tasked to identify a speaker amongst a set of enrolled speakers given just a few samples.
We propose improved relation networks for speaker verification and few-shot (unseen) speaker identification.
Inspired by the use of prototypical networks in speaker verification, we train the model to classify samples in the current episode amongst all speakers present in the training set.
arXiv Detail & Related papers (2022-03-31T17:44:04Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Personalized Keyphrase Detection using Speaker and Environment
Information [24.766475943042202]
We introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary.
The system is implemented with an end-to-end trained automatic speech recognition (ASR) model and a text-independent speaker verification model.
arXiv Detail & Related papers (2021-04-28T18:50:19Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z) - Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis
Using Discrete Speech Representation [125.59372403631006]
We propose a semi-supervised learning approach for multi-speaker text-to-speech (TTS)
A multi-speaker TTS model can learn from the untranscribed audio via the proposed encoder-decoder framework with discrete speech representation.
We found the model can benefit from the proposed semi-supervised learning approach even when part of the unpaired speech data is noisy.
arXiv Detail & Related papers (2020-05-16T15:47:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.