Related papers: SZU-AFS Antispoofing System for the ASVspoof 5 Challenge

SZU-AFS Antispoofing System for the ASVspoof 5 Challenge

URL: http://arxiv.org/abs/2408.09933v1
Date: Mon, 19 Aug 2024 12:12:29 GMT
Title: SZU-AFS Antispoofing System for the ASVspoof 5 Challenge
Authors: Yuxiong Xu, Jiafeng Zhong, Sengui Zheng, Zefeng Liu, Bin Li,
Abstract summary: The SZU-AFS anti-spoofing system was designed for Track 1 of the ASVspoof 5 Challenge under open conditions. The final fusion system achieves a minDCF of 0.115 and an EER of 4.04% on the evaluation set.
Score: 3.713577625357432
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents the SZU-AFS anti-spoofing system, designed for Track 1 of the ASVspoof 5 Challenge under open conditions. The system is built with four stages: selecting a baseline model, exploring effective data augmentation (DA) methods for fine-tuning, applying a co-enhancement strategy based on gradient norm aware minimization (GAM) for secondary fine-tuning, and fusing logits scores from the two best-performing fine-tuned models. The system utilizes the Wav2Vec2 front-end feature extractor and the AASIST back-end classifier as the baseline model. During model fine-tuning, three distinct DA policies have been investigated: single-DA, random-DA, and cascade-DA. Moreover, the employed GAM-based co-enhancement strategy, designed to fine-tune the augmented model at both data and optimizer levels, helps the Adam optimizer find flatter minima, thereby boosting model generalization. Overall, the final fusion system achieves a minDCF of 0.115 and an EER of 4.04% on the evaluation set.

Related papers

Parameter-free entropy-regularized multi-view clustering with hierarchical feature selection [3.8015092217142237]
This work introduces two complementary algorithms: AMVFCM-U and AAMVFCM-U, providing a unified parameter-free framework.<n>AAMVFCM-U achieves up to 97% computational efficiency gains, reduces dimensionality to 0.45% of original size, and automatically identifies critical view combinations.
arXiv Detail & Related papers (2025-08-07T15:36:59Z)
UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting [57.63613048492219]
We present UAVTwin, a method for creating digital twins from real-world environments and facilitating data augmentation for training downstream models embedded in unmanned aerial vehicles (UAVs) This is achieved by integrating 3D Gaussian Splatting (3DGS) for reconstructing backgrounds along with controllable synthetic human models that display diverse appearances and actions in multiple poses.
arXiv Detail & Related papers (2025-04-02T22:17:30Z)
Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization [66.67988187816185]
We aim to emphscale up the number of on-policy samples via repeated random sampling to improve alignment performance. Our experiments reveal that this strategy leads to a emphdecline in performance as the sample size increases. We introduce a scalable preference data construction strategy that consistently enhances model performance as the sample scale increases.
arXiv Detail & Related papers (2025-02-24T04:22:57Z)
Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data [51.62162460809116]
We introduce Dynamic Noise Preference Optimization (DNPO) to ensure consistent improvements across iterations. In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6%. DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations.
arXiv Detail & Related papers (2025-02-08T01:20:09Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
D4AM: A General Denoising Framework for Downstream Acoustic Models [45.04967351760919]
Speech enhancement (SE) can be used as a front-end strategy to aid automatic speech recognition (ASR) systems. Existing training objectives of SE methods are not fully effective at integrating speech-text and noisy-clean paired data for training toward unseen ASR systems. We propose a general denoising framework, D4AM, for various downstream acoustic models.
arXiv Detail & Related papers (2023-11-28T08:27:27Z)
Augmenting conformers with structured state-space sequence models for online speech recognition [41.444671189679994]
Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems. In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space sequence models (S4) We performed systematic ablation studies to compare variants of S4 models and propose two novel approaches that combine them with convolutions. Our best model achieves WERs of 4.01%/8.53% on test sets from Librispeech, outperforming Conformers with extensively tuned convolution.
arXiv Detail & Related papers (2023-09-15T17:14:17Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
DMSA: Dynamic Multi-scale Unsupervised Semantic Segmentation Based on Adaptive Affinity [11.080515677051455]
The framework uses Atrous Spatial Pyramid Pooling (ASPP) module to enhance feature extraction. A Pixel-Adaptive Refinement (PAR) module is introduced, which can adaptively refine the initial pseudo labels. Experiments show that the proposed DSMA framework is superior to the existing methods on the saliency dataset.
arXiv Detail & Related papers (2023-03-01T03:08:30Z)
Towards Robust Recommender Systems via Triple Cooperative Defense [63.64651805384898]
Recommender systems are often susceptible to well-crafted fake profiles, leading to biased recommendations. We propose a general framework, Triple Cooperative Defense, which cooperates to improve model robustness through the co-training of three models. Results show that the robustness improvement of TCD significantly outperforms baselines.
arXiv Detail & Related papers (2022-10-25T04:45:43Z)
Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation [18.684888457998284]
We enhance the robustness of the automatic speaker verification system without the primary presence of a countermeasure module. We employ three unsupervised domain adaptation techniques to optimize the back-end using the audio data. We demonstrate notable improvements on both logical and physical access scenarios.
arXiv Detail & Related papers (2022-03-21T14:02:06Z)
OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression [49.10945855716001]
We propose a novel setting that starts from a pretrained deep generative model and compresses the data batches while adapting the model with a dynamical system for only one epoch. Experimental results show that vanilla OSOA can save significant time versus training bespoke models and space versus using one model for all targets.
arXiv Detail & Related papers (2021-11-02T15:18:25Z)
Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection [10.851348154870852]
We argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process. We propose to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. With complementary features, our fusion system with only three kinds of features outperforms other systems by 22.5% for min-tDCF and 7% for EER.
arXiv Detail & Related papers (2020-06-25T17:06:47Z)
Simple and Effective Prevention of Mode Collapse in Deep One-Class Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD. The first regularizer is based on injecting random noise via the standard cross-entropy loss. The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.