Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge
- URL: http://arxiv.org/abs/2408.06922v1
- Date: Tue, 13 Aug 2024 14:15:15 GMT
- Title: Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge
- Authors: Yuankun Xie, Xiaopeng Wang, Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Haonan Cheng, Long Ye,
- Abstract summary: We introduce Frequency Mask, a data augmentation method that masks specific frequency bands to improve CM.
Our experiments achieved a minDCF of 0.0158 and an EER of 0.55% on the ASVspoof 5 Track 1 evaluation progress set.
- Score: 21.655127750485097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ASVspoof5, the fifth edition of the ASVspoof series, is one of the largest global audio security challenges. It aims to advance the development of countermeasure (CM) to discriminate bonafide and spoofed speech utterances. In this paper, we focus on addressing the problem of open-domain audio deepfake detection, which corresponds directly to the ASVspoof5 Track1 open condition. At first, we comprehensively investigate various CM on ASVspoof5, including data expansion, data augmentation, and self-supervised learning (SSL) features. Due to the high-frequency gaps characteristic of the ASVspoof5 dataset, we introduce Frequency Mask, a data augmentation method that masks specific frequency bands to improve CM robustness. Combining various scale of temporal information with multiple SSL features, our experiments achieved a minDCF of 0.0158 and an EER of 0.55% on the ASVspoof 5 Track 1 evaluation progress set.
Related papers
- Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge [13.54987267358107]
ASVspoof challenge has become one of the benchmarks to evaluate the generalizability and robustness of detection models.
We present Reality Defender's submission to the ASVspoof5 challenge, highlighting a novel pretraining strategy.
Our system SLIM learns the style-linguistics dependency embeddings from various types of bonafide speech using self-supervised contrastive learning.
arXiv Detail & Related papers (2024-10-09T18:55:28Z) - ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale [59.25180900687571]
ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech spoofing and deepfake attacks.
We describe the two challenge tracks, the new database, the evaluation metrics, and the evaluation platform, and present a summary of the results.
arXiv Detail & Related papers (2024-08-16T13:37:20Z) - ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks [43.42682181017004]
In this paper, we replace the image input with text for Vision-language training.
Inspired by prior noise injection methods, we introduce Adaptive ranged cosine Similarity injected noise (ArcSin)
Our empirical results demonstrate that these models closely rival those trained on images in terms of performance.
arXiv Detail & Related papers (2024-02-27T08:20:45Z) - Generalizing Speaker Verification for Spoof Awareness in the Embedding
Space [30.094557217931563]
ASV systems can be spoofed using various types of adversaries.
We propose a novel yet simple backend classifier based on deep neural networks.
Experiments are conducted on the ASVspoof 2019 logical access dataset.
arXiv Detail & Related papers (2024-01-20T07:30:22Z) - Towards single integrated spoofing-aware speaker verification embeddings [63.42889348690095]
This study aims to develop a single integrated spoofing-aware speaker verification embeddings.
We analyze that the inferior performance of single SASV embeddings comes from insufficient amount of training data.
Experiments show dramatic improvements, achieving a SASV-EER of 1.06% on the evaluation protocol of the SASV2022 challenge.
arXiv Detail & Related papers (2023-05-30T14:15:39Z) - Leveraging Unlabelled Data in Multiple-Instance Learning Problems for
Improved Detection of Parkinsonian Tremor in Free-Living Conditions [80.88681952022479]
We introduce a new method for combining semi-supervised with multiple-instance learning.
We show that by leveraging the unlabelled data of 454 subjects we can achieve large performance gains in per-subject tremor detection.
arXiv Detail & Related papers (2023-04-29T12:25:10Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - Tandem Assessment of Spoofing Countermeasures and Automatic Speaker
Verification: Fundamentals [59.34844017757795]
The reliability of spoofing countermeasures (CMs) is gauged using the equal error rate (EER) metric.
This paper presents several new extensions to the tandem detection cost function (t-DCF)
It is hoped that adoption of the t-DCF for the CM assessment will help to foster closer collaboration between the anti-spoofing and ASV research communities.
arXiv Detail & Related papers (2020-07-12T12:44:08Z) - Detecting Parkinsonian Tremor from IMU Data Collected In-The-Wild using
Deep Multiple-Instance Learning [59.74684475991192]
Parkinson's Disease (PD) is a slowly evolving neuro-logical disease that affects about 1% of the population above 60 years old.
PD symptoms include tremor, rigidity and braykinesia.
We present a method for automatically identifying tremorous episodes related to PD, based on IMU signals captured via a smartphone device.
arXiv Detail & Related papers (2020-05-06T09:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.