Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection
- URL: http://arxiv.org/abs/2509.13878v1
- Date: Wed, 17 Sep 2025 10:13:58 GMT
- Title: Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection
- Authors: Janne Laakkonen, Ivan Kukanov, Ville Hautamäki,
- Abstract summary: Foundation models excel at representation learning in speech tasks, including audio deepfake detection.<n>We propose a mixture-of-LoRA-experts approach that integrates multiple low-rank adapters into the model's attention layers.<n> Experimental results show that our method outperforms standard fine-tuning in both in-domain and out-of-domain scenarios.
- Score: 5.277931896456617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models such as Wav2Vec2 excel at representation learning in speech tasks, including audio deepfake detection. However, after being fine-tuned on a fixed set of bonafide and spoofed audio clips, they often fail to generalize to novel deepfake methods not represented in training. To address this, we propose a mixture-of-LoRA-experts approach that integrates multiple low-rank adapters (LoRA) into the model's attention layers. A routing mechanism selectively activates specialized experts, enhancing adaptability to evolving deepfake attacks. Experimental results show that our method outperforms standard fine-tuning in both in-domain and out-of-domain scenarios, reducing equal error rates relative to baseline models. Notably, our best MoE-LoRA model lowers the average out-of-domain EER from 8.55\% to 6.08\%, demonstrating its effectiveness in achieving generalizable audio deepfake detection.
Related papers
- Multi-modal Deepfake Detection and Localization with FPN-Transformer [21.022230340898556]
We introduce a multi-modal deepfake detection and localization framework based on a Feature Pyramid-Transformer (FPN-Transformer)<n>A multi-scale feature pyramid is constructed through R-TLM blocks with localized attention mechanisms, enabling joint analysis of cross-context temporal dependencies.<n>We evaluate our approach on the test set of the IJCAI'25 DDL-AV benchmark, showing a good performance with a final score of 0.7535.
arXiv Detail & Related papers (2025-11-11T09:33:39Z) - DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks [47.58150560549918]
Weight-Decomposed Low-Rank Adaptation (DoRA) has been shown to improve both the learning capacity and training stability of the vanilla Low-Rank Adaptation (LoRA) method.<n>We propose DoRAN, a new variant of DoRA designed to further stabilize training and boost the sample efficiency of DoRA.
arXiv Detail & Related papers (2025-10-05T19:27:48Z) - Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework [7.755879452365207]
The proliferation of audio deepfakes poses a growing threat to trust in digital communications.<n>We introduce LAVA, a hierarchical framework for audio deepfake detection and model recognition.<n>Two specialized classifiers operate on these features: Audio Deepfake Attribution (ADA), which identifies the generation technology, and Audio Deepfake Model Recognition (ADMR), which recognize the specific generative model instance.
arXiv Detail & Related papers (2025-08-04T15:31:13Z) - Reliable Few-shot Learning under Dual Noises [166.53173694689693]
We propose DEnoised Task Adaptation (DETA++) for reliable few-shot learning.<n>DETA++ employs a memory bank to store and refine clean regions for each inner-task class, based on which a Local Nearestid (LocalNCC) is devised to yield noise-robust predictions on query samples.<n>Extensive experiments demonstrate the effectiveness and flexibility of DETA++.
arXiv Detail & Related papers (2025-06-19T14:05:57Z) - Unified AI for Accurate Audio Anomaly Detection [0.0]
This paper presents a unified AI framework for high-accuracy audio anomaly detection.<n>It integrates advanced noise reduction, feature extraction, and machine learning modeling techniques.<n>The framework is evaluated on benchmark datasets including TORGO and LibriSpeech.
arXiv Detail & Related papers (2025-05-20T16:56:08Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection [57.537583869961885]
Self-supervised speech models are a rapidly developing research topic in fake audio detection.
We apply low-rank adaptation(LoRA) to the wav2vec2 model, freezing the pre-trained model weights and injecting a trainable rank-decomposition matrix into each layer of the transformer architecture.
Compared with fine-tuning with Adam on the wav2vec2 model containing 317M training parameters, LoRA achieved similar performance by reducing the number of trainable parameters by 198 times.
arXiv Detail & Related papers (2023-06-09T01:43:41Z) - Adaptive Fake Audio Detection with Low-Rank Model Squeezing [50.7916414913962]
Traditional approaches, such as finetuning, are computationally intensive and pose a risk of impairing the acquired knowledge of known fake audio types.
We introduce the concept of training low-rank adaptation matrices tailored specifically to the newly emerging fake audio types.
Our approach offers several advantages, including reduced storage memory requirements and lower equal error rates.
arXiv Detail & Related papers (2023-06-08T06:06:42Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Attack Agnostic Dataset: Towards Generalization and Stabilization of
Audio DeepFake Detection [0.4511923587827302]
Methods for detecting audio DeepFakes should be characterized by good generalization and stability.
We present a thorough analysis of current DeepFake detection methods and consider different audio features (front-ends)
We propose a model based on LCNN with LFCC and mel-spectrogram front-end, which shows improvement over LFCC-based mode.
arXiv Detail & Related papers (2022-06-27T12:30:44Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.