Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation
- URL: http://arxiv.org/abs/2203.10992v1
- Date: Mon, 21 Mar 2022 14:02:06 GMT
- Title: Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation
- Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen
- Abstract summary: We enhance the robustness of the automatic speaker verification system without the primary presence of a countermeasure module.
We employ three unsupervised domain adaptation techniques to optimize the back-end using the audio data.
We demonstrate notable improvements on both logical and physical access scenarios.
- Score: 18.684888457998284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we initiate the concern of enhancing the spoofing robustness
of the automatic speaker verification (ASV) system, without the primary
presence of a separate countermeasure module. We start from the standard ASV
framework of the ASVspoof 2019 baseline and approach the problem from the
back-end classifier based on probabilistic linear discriminant analysis. We
employ three unsupervised domain adaptation techniques to optimize the back-end
using the audio data in the training partition of the ASVspoof 2019 dataset. We
demonstrate notable improvements on both logical and physical access scenarios,
especially on the latter where the system is attacked by replayed audios, with
a maximum of 36.1% and 5.3% relative improvement on bonafide and spoofed cases,
respectively. We perform additional studies such as per-attack breakdown
analysis, data composition, and integration with a countermeasure system at
score-level with Gaussian back-end.
Related papers
- Generalizing Speaker Verification for Spoof Awareness in the Embedding
Space [30.094557217931563]
ASV systems can be spoofed using various types of adversaries.
We propose a novel yet simple backend classifier based on deep neural networks.
Experiments are conducted on the ASVspoof 2019 logical access dataset.
arXiv Detail & Related papers (2024-01-20T07:30:22Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Audio Anti-spoofing Using a Simple Attention Module and Joint
Optimization Based on Additive Angular Margin Loss and Meta-learning [43.519717601587864]
This study introduces a simple attention module to infer 3-dim attention weights for the feature map in a convolutional layer.
We propose a joint optimization approach based on the weighted additive angular margin loss for binary classification.
Our proposed approach delivers a competitive result with a pooled EER of 0.99% and min t-DCF of 0.0289.
arXiv Detail & Related papers (2022-11-17T21:25:29Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Optimizing Tandem Speaker Verification and Anti-Spoofing Systems [45.66319648049384]
We propose to optimize the tandem system directly by creating a differentiable version of t-DCF and employing techniques from reinforcement learning.
Results indicate that these approaches offer better outcomes than finetuning, with our method providing a 20% relative improvement in the t-DCF in the ASVSpoof19 dataset.
arXiv Detail & Related papers (2022-01-24T14:27:28Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.
This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z) - Audio Spoofing Verification using Deep Convolutional Neural Networks by
Transfer Learning [0.0]
We propose a speech classifier based on deep-convolutional neural network to detect spoofing attacks.
Our proposed methodology uses acoustic time-frequency representation of power spectral densities on Mel frequency scale.
We have achieved an equal error rate (EER) of 0.9056% on the development and 5.32% on the evaluation dataset of logical access scenario.
arXiv Detail & Related papers (2020-08-08T07:14:40Z) - Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for
Replay Attack Detection [10.851348154870852]
We argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process.
We propose to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself.
With complementary features, our fusion system with only three kinds of features outperforms other systems by 22.5% for min-tDCF and 7% for EER.
arXiv Detail & Related papers (2020-06-25T17:06:47Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.