SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice
Anti-Spoofing
- URL: http://arxiv.org/abs/2211.02718v1
- Date: Fri, 4 Nov 2022 19:31:33 GMT
- Title: SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice
Anti-Spoofing
- Authors: Siwen Ding, You Zhang, Zhiyao Duan
- Abstract summary: Anti-spoofing systems are crucial auxiliaries for automatic speaker verification (ASV) systems.
We propose speaker attractor multi-center one-class learning (SAMO), which clusters bona fide speech around a number of speaker attractors.
Our proposed system outperforms existing state-of-the-art single systems with a relative improvement of 38% on equal error rate (EER) on the ASVspoof 2019 LA evaluation set.
- Score: 22.47152800242178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Voice anti-spoofing systems are crucial auxiliaries for automatic speaker
verification (ASV) systems. A major challenge is caused by unseen attacks
empowered by advanced speech synthesis technologies. Our previous research on
one-class learning has improved the generalization ability to unseen attacks by
compacting the bona fide speech in the embedding space. However, such
compactness lacks consideration of the diversity of speakers. In this work, we
propose speaker attractor multi-center one-class learning (SAMO), which
clusters bona fide speech around a number of speaker attractors and pushes away
spoofing attacks from all the attractors in a high-dimensional embedding space.
For training, we propose an algorithm for the co-optimization of bona fide
speech clustering and bona fide/spoof classification. For inference, we propose
strategies to enable anti-spoofing for speakers without enrollment. Our
proposed system outperforms existing state-of-the-art single systems with a
relative improvement of 38% on equal error rate (EER) on the ASVspoof2019 LA
evaluation set.
Related papers
- Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual
Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks.
We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z) - Dictionary Attacks on Speaker Verification [15.00667613025837]
We introduce a generic formulation of the attack that can be used with various speech representations and threat models.
The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population.
We show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
arXiv Detail & Related papers (2022-04-24T15:31:41Z) - Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech.
Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity.
We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z) - Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring [60.55025339250815]
We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling.
We take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context from these responses and feed them as additional speaker-specific context to our network to score a particular response.
arXiv Detail & Related papers (2021-08-30T07:00:28Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - U-vectors: Generating clusterable speaker embedding from unlabeled data [0.0]
This paper introduces a speaker recognition strategy dealing with unlabeled data.
It generates clusterable embedding vectors from small fixed-size speech frames.
We conclude that the proposed approach achieves remarkable performance using pairwise architectures.
arXiv Detail & Related papers (2021-02-07T18:00:09Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z) - Adversarial Attack and Defense Strategies for Deep Speaker Recognition
Systems [44.305353565981015]
This paper considers several state-of-the-art adversarial attacks to a deep speaker recognition system, employing strong defense methods as countermeasures.
Experiments show that the speaker recognition systems are vulnerable to adversarial attacks, and the strongest attacks can reduce the accuracy of the system from 94% to even 0%.
arXiv Detail & Related papers (2020-08-18T00:58:19Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.