Related papers: Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation

Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation

URL: http://arxiv.org/abs/2203.10992v1
Date: Mon, 21 Mar 2022 14:02:06 GMT
Title: Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation
Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen
Abstract summary: We enhance the robustness of the automatic speaker verification system without the primary presence of a countermeasure module. We employ three unsupervised domain adaptation techniques to optimize the back-end using the audio data. We demonstrate notable improvements on both logical and physical access scenarios.
Score: 18.684888457998284
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we initiate the concern of enhancing the spoofing robustness of the automatic speaker verification (ASV) system, without the primary presence of a separate countermeasure module. We start from the standard ASV framework of the ASVspoof 2019 baseline and approach the problem from the back-end classifier based on probabilistic linear discriminant analysis. We employ three unsupervised domain adaptation techniques to optimize the back-end using the audio data in the training partition of the ASVspoof 2019 dataset. We demonstrate notable improvements on both logical and physical access scenarios, especially on the latter where the system is attacked by replayed audios, with a maximum of 36.1% and 5.3% relative improvement on bonafide and spoofed cases, respectively. We perform additional studies such as per-attack breakdown analysis, data composition, and integration with a countermeasure system at score-level with Gaussian back-end.

Related papers

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples [33.445126880876415]
We propose a reliable and robust spoofing detection system to filter out spoofing attacks instead of having them reach the automatic speaker verification system. A weighted additive angular margin loss is proposed to address the data imbalance issue, and different margins has been assigned to improve generalization to unseen spoofing attacks. We craft adversarial examples by adding imperceptible perturbations to spoofing speech as a data augmentation strategy, then we use an auxiliary batch normalization to guarantee that corresponding normalization statistics are performed exclusively on the adversarial examples.
arXiv Detail & Related papers (2024-08-23T19:26:54Z)
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale [59.25180900687571]
ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech spoofing and deepfake attacks. We describe the two challenge tracks, the new database, the evaluation metrics, and the evaluation platform, and present a summary of the results.
arXiv Detail & Related papers (2024-08-16T13:37:20Z)
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space [30.094557217931563]
ASV systems can be spoofed using various types of adversaries. We propose a novel yet simple backend classifier based on deep neural networks. Experiments are conducted on the ASVspoof 2019 logical access dataset.
arXiv Detail & Related papers (2024-01-20T07:30:22Z)
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders. Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z)
Audio Anti-spoofing Using a Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning [43.519717601587864]
This study introduces a simple attention module to infer 3-dim attention weights for the feature map in a convolutional layer. We propose a joint optimization approach based on the weighted additive angular margin loss for binary classification. Our proposed approach delivers a competitive result with a pooled EER of 0.99% and min t-DCF of 0.0289.
arXiv Detail & Related papers (2022-11-17T21:25:29Z)
Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV) We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples. Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z)
Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences. Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness. This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z)
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z)
Audio Spoofing Verification using Deep Convolutional Neural Networks by Transfer Learning [0.0]
We propose a speech classifier based on deep-convolutional neural network to detect spoofing attacks. Our proposed methodology uses acoustic time-frequency representation of power spectral densities on Mel frequency scale. We have achieved an equal error rate (EER) of 0.9056% on the development and 5.32% on the evaluation dataset of logical access scenario.
arXiv Detail & Related papers (2020-08-08T07:14:40Z)
Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection [10.851348154870852]
We argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process. We propose to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. With complementary features, our fusion system with only three kinds of features outperforms other systems by 22.5% for min-tDCF and 7% for EER.
arXiv Detail & Related papers (2020-06-25T17:06:47Z)
Unsupervised Domain Adaptation for Acoustic Scene Classification Using Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions. We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset. We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.