Related papers: Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck

Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck

URL: http://arxiv.org/abs/2204.01387v1
Date: Mon, 4 Apr 2022 11:08:21 GMT
Title: Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck
Authors: Youngsik Eom, Yeonghyeon Lee, Ji Sub Um, Hoirin Kim
Abstract summary: We propose a transfer learning scheme based on the wav2vec 2.0 pretrained model with variational information bottleneck for speech anti-spoofing task. Our method improves the performance of distinguishing unseen spoofed and genuine speech, outperforming current state-of-the-art anti-spoofing systems.
Score: 6.918364447822298
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in sophisticated synthetic speech generated from text-to-speech (TTS) or voice conversion (VC) systems cause threats to the existing automatic speaker verification (ASV) systems. Since such synthetic speech is generated from diverse algorithms, generalization ability with using limited training data is indispensable for a robust anti-spoofing system. In this work, we propose a transfer learning scheme based on the wav2vec 2.0 pretrained model with variational information bottleneck (VIB) for speech anti-spoofing task. Evaluation on the ASVspoof 2019 logical access (LA) database shows that our method improves the performance of distinguishing unseen spoofed and genuine speech, outperforming current state-of-the-art anti-spoofing systems. Furthermore, we show that the proposed system improves performance in low-resource and cross-dataset settings of anti-spoofing task significantly, demonstrating that our system is also robust in terms of data size and data distribution.

Related papers

Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM [53.17360668423001]
Overlapping Speech Detection (OSD) aims to identify regions where multiple speakers overlap in a conversation.<n>This work proposes a speaker-aware progressive OSD model that leverages a progressive training strategy to enhance the correlation between subtasks.<n> Experimental results show that the proposed method achieves state-of-the-art performance, with an F1 score of 82.76% on the AMI test set.
arXiv Detail & Related papers (2025-05-29T07:47:48Z)
Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples [33.445126880876415]
We propose a reliable and robust spoofing detection system to filter out spoofing attacks instead of having them reach the automatic speaker verification system. A weighted additive angular margin loss is proposed to address the data imbalance issue, and different margins has been assigned to improve generalization to unseen spoofing attacks. We craft adversarial examples by adding imperceptible perturbations to spoofing speech as a data augmentation strategy, then we use an auxiliary batch normalization to guarantee that corresponding normalization statistics are performed exclusively on the adversarial examples.
arXiv Detail & Related papers (2024-08-23T19:26:54Z)
Audio Anti-spoofing Using a Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning [43.519717601587864]
This study introduces a simple attention module to infer 3-dim attention weights for the feature map in a convolutional layer. We propose a joint optimization approach based on the weighted additive angular margin loss for binary classification. Our proposed approach delivers a competitive result with a pooled EER of 0.99% and min t-DCF of 0.0289.
arXiv Detail & Related papers (2022-11-17T21:25:29Z)
Contextual-Utterance Training for Automatic Speech Recognition [65.4571135368178]
We propose a contextual-utterance training technique which makes use of the previous and future contextual utterances. Also, we propose a dual-mode contextual-utterance training technique for streaming automatic speech recognition (ASR) systems. The proposed technique is able to reduce both the WER and the average last token emission latency by more than 6% and 40ms relative.
arXiv Detail & Related papers (2022-10-27T08:10:44Z)
Simple and Effective Unsupervised Speech Translation [68.25022245914363]
We study a simple and effective approach to build speech translation systems without labeled data. We present an unsupervised domain adaptation technique for pre-trained speech models. Experiments show that unsupervised speech-to-text translation outperforms the previous unsupervised state of the art.
arXiv Detail & Related papers (2022-10-18T22:26:13Z)
ConvNext Based Neural Network for Anti-Spoofing [6.047242590232868]
Automatic speaker verification (ASV) has been widely used in the real life for identity authentication. With the rapid development of speech conversion, speech algorithms and the improvement of the quality of recording devices, ASV systems are vulnerable for spoof attacks.
arXiv Detail & Related papers (2022-09-14T05:53:37Z)
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues. In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z)
Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition [18.83748866242237]
We focus on a rigorous and empirical "closed-model adversarial robustness" setting. We propose an advanced Bayesian neural network (BNN) based adversarial detector. We improve detection rate by +2.77 to +5.42% (relative +3.03 to +6.26%) and reduce the word error rate by 5.02 to 7.47% on LibriSpeech datasets.
arXiv Detail & Related papers (2022-02-17T09:17:58Z)
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems [45.66319648049384]
We propose to optimize the tandem system directly by creating a differentiable version of t-DCF and employing techniques from reinforcement learning. Results indicate that these approaches offer better outcomes than finetuning, with our method providing a 20% relative improvement in the t-DCF in the ASVSpoof19 dataset.
arXiv Detail & Related papers (2022-01-24T14:27:28Z)
Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction. It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition. We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z)
Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification [78.51092318750102]
This work proposes to defend ASV systems against adversarial attacks with a separate detection network. A VGG-like binary classification detector is introduced and demonstrated to be effective on detecting adversarial samples.
arXiv Detail & Related papers (2020-06-11T04:31:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.