Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies
- URL: http://arxiv.org/abs/2311.17088v2
- Date: Thu, 20 Jun 2024 20:24:59 GMT
- Title: Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies
- Authors: Mulin Tian, Mahyar Khayatkhoei, Joe Mathai, Wael AbdAlmageed,
- Abstract summary: Deepfake videos present an increasing threat to society with potentially negative impact on criminal justice, democracy, and personal safety and privacy.
We propose a novel unsupervised method for detecting deepfake videos by directly identifying intra-modal and cross-modal inconsistency between video segments.
Our proposed method outperforms prior state-of-the-art unsupervised deepfake detection methods on the challenging FakeAVCeleb dataset.
- Score: 14.660707087391463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deepfake videos present an increasing threat to society with potentially negative impact on criminal justice, democracy, and personal safety and privacy. Meanwhile, detecting deepfakes, at scale, remains a very challenging task that often requires labeled training data from existing deepfake generation methods. Further, even the most accurate supervised deepfake detection methods do not generalize to deepfakes generated using new generation methods. In this paper, we propose a novel unsupervised method for detecting deepfake videos by directly identifying intra-modal and cross-modal inconsistency between video segments. The fundamental hypothesis behind the proposed detection method is that motion or identity inconsistencies are inevitable in deepfake videos. We will mathematically and empirically support this hypothesis, and then proceed to constructing our method grounded in our theoretical analysis. Our proposed method outperforms prior state-of-the-art unsupervised deepfake detection methods on the challenging FakeAVCeleb dataset, and also has several additional advantages: it is scalable because it does not require pristine (real) samples for each identity during inference and therefore can apply to arbitrarily many identities, generalizable because it is trained only on real videos and therefore does not rely on a particular deepfake method, reliable because it does not rely on any likelihood estimation in high dimensions, and explainable because it can pinpoint the exact location of modality inconsistencies which are then verifiable by a human expert.
Related papers
- Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud.
In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video.
We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z) - Shaking the Fake: Detecting Deepfake Videos in Real Time via Active Probes [3.6308756891251392]
Real-time deepfake, a type of generative AI, is capable of "creating" non-existing contents (e.g., swapping one's face with another) in a video.
It has been misused to produce deepfake videos for malicious purposes, including financial scams and political misinformation.
We propose SFake, a new real-time deepfake detection method that exploits deepfake models' inability to adapt to physical interference.
arXiv Detail & Related papers (2024-09-17T04:58:30Z) - CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF)
Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts.
It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z) - How Generalizable are Deepfake Image Detectors? An Empirical Study [4.42204674141385]
We present the first empirical study on the generalizability of deepfake detectors.
Our study utilizes six deepfake datasets, five deepfake image detection methods, and two model augmentation approaches.
We find that detectors are learning unwanted properties specific to synthesis methods and struggling to extract discriminative features.
arXiv Detail & Related papers (2023-08-08T10:30:34Z) - Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption [94.5031244215761]
We propose to boost the generalization of deepfake detection by distinguishing the "regularity disruption" that does not appear in real videos.
Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator.
Such practice allows us to achieve deepfake detection without using fake videos and improves the generalization ability in a simple and efficient manner.
arXiv Detail & Related papers (2022-07-21T10:42:34Z) - A Continual Deepfake Detection Benchmark: Dataset, Methods, and
Essentials [97.69553832500547]
This paper suggests a continual deepfake detection benchmark (CDDB) over a new collection of deepfakes from both known and unknown generative models.
We exploit multiple approaches to adapt multiclass incremental learning methods, commonly used in the continual visual recognition, to the continual deepfake detection problem.
arXiv Detail & Related papers (2022-05-11T13:07:19Z) - Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos.
We propose to perform the deepfake detection from an unexplored voice-face matching view.
Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z) - Self-supervised Transformer for Deepfake Detection [112.81127845409002]
Deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors.
Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection.
In this paper, we propose a self-supervised transformer based audio-visual contrastive learning method.
arXiv Detail & Related papers (2022-03-02T17:44:40Z) - Understanding the Security of Deepfake Detection [23.118012417901078]
We study the security of state-of-the-art deepfake detection methods in adversarial settings.
We use two large-scale public deepfakes data sources including FaceForensics++ and Facebook Deepfake Detection Challenge.
Our results uncover multiple security limitations of the deepfake detection methods in adversarial settings.
arXiv Detail & Related papers (2021-07-05T14:18:21Z) - TAR: Generalized Forensic Framework to Detect Deepfakes using Weakly
Supervised Learning [17.40885531847159]
Deepfakes have become a critical social problem, and detecting them is of utmost importance.
In this work, we introduce a practical digital forensic tool to detect different types of deepfakes simultaneously.
We develop an autoencoder-based detection model with Residual blocks and sequentially perform transfer learning to detect different types of deepfakes simultaneously.
arXiv Detail & Related papers (2021-05-13T07:31:08Z) - One Detector to Rule Them All: Towards a General Deepfake Attack
Detection Framework [19.762839181838388]
We introduce a Convolutional LSTM-based Residual Network (CLRNet) to better cope with unknown and unseen deepfakes.
Our CLRNet model demonstrated that it generalizes well against high-quality DFW videos by achieving 93.86% detection accuracy.
arXiv Detail & Related papers (2021-05-01T08:02:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.