Related papers: Leveraging Mixture of Experts for Improved Speech Deepfake Detection

Leveraging Mixture of Experts for Improved Speech Deepfake Detection

URL: http://arxiv.org/abs/2409.16077v1
Date: Tue, 24 Sep 2024 13:24:03 GMT
Title: Leveraging Mixture of Experts for Improved Speech Deepfake Detection
Authors: Viola Negroni, Davide Salvi, Alessandro Ilic Mezza, Paolo Bestagini, Stefano Tubaro,
Abstract summary: Speech deepfakes pose a significant threat to personal security and content authenticity. We introduce a novel approach for enhancing speech deepfake detection performance using a Mixture of Experts architecture.
Score: 53.69740463004446
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speech deepfakes pose a significant threat to personal security and content authenticity. Several detectors have been proposed in the literature, and one of the primary challenges these systems have to face is the generalization over unseen data to identify fake signals across a wide range of datasets. In this paper, we introduce a novel approach for enhancing speech deepfake detection performance using a Mixture of Experts architecture. The Mixture of Experts framework is well-suited for the speech deepfake detection task due to its ability to specialize in different input types and handle data variability efficiently. This approach offers superior generalization and adaptability to unseen data compared to traditional single models or ensemble methods. Additionally, its modular structure supports scalable updates, making it more flexible in managing the evolving complexity of deepfake techniques while maintaining high detection accuracy. We propose an efficient, lightweight gating mechanism to dynamically assign expert weights for each input, optimizing detection performance. Experimental results across multiple datasets demonstrate the effectiveness and potential of our proposed approach.

Related papers

Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks [49.84182981950623]
Multispectral object detection, utilizing RGB and TIR (thermal infrared) modalities, is widely recognized as a challenging task. It requires not only the effective extraction of features from both modalities and robust fusion strategies, but also the ability to address issues such as spectral discrepancies. We introduce an efficient and easily deployable multispectral object detection framework that can seamlessly optimize high-performing single-modality models.
arXiv Detail & Related papers (2024-11-27T12:18:39Z)
Semantics-Oriented Multitask Learning for DeepFake Detection: A Joint Embedding Approach [77.65459419417533]
We propose an automatic dataset expansion technique to support semantics-oriented DeepFake detection tasks. We also resort to joint embedding of face images and their corresponding labels for prediction. Our method improves the generalizability of DeepFake detection and renders some degree of model interpretation by providing human-understandable explanations.
arXiv Detail & Related papers (2024-08-29T07:11:50Z)
Targeted Augmented Data for Audio Deepfake Detection [11.671275975119089]
We propose a novel augmentation method for generating audio pseudo-fakes targeting the decision boundary of the model. Inspired by adversarial attacks, we perturb original real data to synthesize pseudo-fakes with ambiguous prediction probabilities.
arXiv Detail & Related papers (2024-07-10T12:31:53Z)
Straight Through Gumbel Softmax Estimator based Bimodal Neural Architecture Search for Audio-Visual Deepfake Detection [6.367999777464464]
multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting. In this paper, we introduce the Straight-through Gumbel-Softmax framework, offering a comprehensive approach to search multimodal fusion model architectures. Experiments on the FakeAVCeleb and SWAN-DF datasets demonstrated an impressive AUC value 94.4% achieved with minimal model parameters.
arXiv Detail & Related papers (2024-06-19T09:26:22Z)
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z)
CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF) Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts. It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Self-Supervised Graph Transformer for Deepfake Detection [1.8133635752982105]
Deepfake detection methods have shown promising results in recognizing forgeries within a given dataset. Deepfake detection system must remain impartial to forgery types, appearance, and quality for guaranteed generalizable detection performance. This study introduces a deepfake detection framework, leveraging a self-supervised pre-training model that delivers exceptional generalization ability.
arXiv Detail & Related papers (2023-07-27T17:22:41Z)
Learning Pairwise Interaction for Generalizable DeepFake Detection [20.723277551489186]
A fast-paced development of DeepFake generation techniques challenge the detection schemes designed for known type DeepFakes. We propose a new approach, Multi-Channel Xception Attention Pairwise Interaction (MCX-API), that exploits the power of pairwise learning and complementary information from different color space representations. Our experiments indicate that our proposed method can generalize better than the state-of-the-art Deepfakes detectors.
arXiv Detail & Related papers (2023-02-26T10:39:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.