Characterizing the temporal dynamics of universal speech representations
for generalizable deepfake detection
- URL: http://arxiv.org/abs/2309.08099v1
- Date: Fri, 15 Sep 2023 01:37:45 GMT
- Title: Characterizing the temporal dynamics of universal speech representations
for generalizable deepfake detection
- Authors: Yi Zhu, Saurabh Powar, and Tiago H. Falk
- Abstract summary: Existing deepfake speech detection systems lack generalizability to unseen attacks.
Recent studies have explored the use of universal speech representations to tackle this issue.
We argue that characterizing the long-term temporal dynamics of these representations is crucial for generalizability.
- Score: 14.449940985934388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing deepfake speech detection systems lack generalizability to unseen
attacks (i.e., samples generated by generative algorithms not seen during
training). Recent studies have explored the use of universal speech
representations to tackle this issue and have obtained inspiring results. These
works, however, have focused on innovating downstream classifiers while leaving
the representation itself untouched. In this study, we argue that
characterizing the long-term temporal dynamics of these representations is
crucial for generalizability and propose a new method to assess representation
dynamics. Indeed, we show that different generative models generate similar
representation dynamics patterns with our proposed method. Experiments on the
ASVspoof 2019 and 2021 datasets validate the benefits of the proposed method to
detect deepfakes from methods unseen during training, significantly improving
on several benchmark methods.
Related papers
- Generalizable Audio Spoofing Detection using Non-Semantic Representations [12.685819931453045]
generative modeling has made synthetic audio generation easy, making speech-based services vulnerable to spoofing attacks.<n>Existing solutions for deepfake detection are often criticized for lacking generalizability and fail drastically when applied to real-world data.<n>This study proposes a novel method for generalizable spoofing detection leveraging non-semantic universal audio representations.
arXiv Detail & Related papers (2025-08-29T18:37:57Z) - Anomaly Detection and Localization for Speech Deepfakes via Feature Pyramid Matching [8.466707742593078]
Speech deepfakes are synthetic audio signals that can imitate target speakers' voices.
Existing methods for detecting speech deepfakes rely on supervised learning.
We introduce a novel interpretable one-class detection framework, which reframes speech deepfake detection as an anomaly detection task.
arXiv Detail & Related papers (2025-03-23T11:15:22Z) - Robust Dynamic Facial Expression Recognition [6.626374248579249]
This paper proposes a robust method of distinguishing between hard and noisy samples.
To identify the principal expression in a video, a key expression re-sampling framework and a dual-stream hierarchical network is proposed.
The proposed method has been shown to outperform current State-Of-The-Art approaches in DFER.
arXiv Detail & Related papers (2025-02-22T07:48:12Z) - Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection [15.857961926916465]
We present a novel general deepfake detection method, called textbfCurricular textbfDynamic textbfForgery textbfAugmentation (CDFA)
CDFA jointly trains a deepfake detector with a forgery augmentation policy network.
We show that CDFA can significantly improve both cross-datasets and cross-manipulations performances of various naive deepfake detectors.
arXiv Detail & Related papers (2024-09-22T13:51:22Z) - Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection [16.21235742118949]
We propose a novel approach that repurposes a well-trained Vision-Language Models (VLMs) for general deepfake detection.
Motivated by the model reprogramming paradigm that manipulates the model prediction via data perturbations, our method can reprogram a pretrained VLM model.
Our superior performances are at less cost of trainable parameters, making it a promising approach for real-world applications.
arXiv Detail & Related papers (2024-09-04T12:46:30Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing [53.325039475118814]
Current trends in anti-spoofing detection research strive to improve models' ability to generalize across unseen attacks.
Recent studies have noted that the distribution of silence differs between the two classes, which can serve as a shortcut.
We employ loss analysis and asymmetric methodologies to move away from traditional attack-focused and result-oriented evaluations.
arXiv Detail & Related papers (2024-06-25T03:24:12Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Prompting Diffusion Representations for Cross-Domain Semantic
Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation.
We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Self-supervised Learning of Adversarial Example: Towards Good
Generalizations for Deepfake Detection [41.27496491339225]
This work addresses the generalizable deepfake detection from a simple principle.
We propose to enrich the "diversity" of forgeries by synthesizing augmented forgeries with a pool of forgery configurations.
We also propose to use the adversarial training strategy to dynamically synthesize the most challenging forgeries to the current model.
arXiv Detail & Related papers (2022-03-23T05:52:23Z) - Layer-wise Analysis of a Self-supervised Speech Representation Model [26.727775920272205]
Self-supervised learning approaches have been successful for pre-training speech representation models.
Not much has been studied about the type or extent of information encoded in the pre-trained representations themselves.
arXiv Detail & Related papers (2021-07-10T02:13:25Z) - On Contrastive Representations of Stochastic Processes [53.21653429290478]
Learning representations of processes is an emerging problem in machine learning.
We show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes.
arXiv Detail & Related papers (2021-06-18T11:00:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.