Multimodal Graph Learning for Deepfake Detection
- URL: http://arxiv.org/abs/2209.05419v2
- Date: Fri, 28 Apr 2023 09:13:58 GMT
- Title: Multimodal Graph Learning for Deepfake Detection
- Authors: Zhiyuan Yan, Peng Sun, Yubo Lang, Shuo Du, Shanzhuo Zhang, Wei Wang,
Lei Liu
- Abstract summary: Existing deepfake detectors face several challenges in achieving robustness and generalization.
We propose a novel framework, namely Multimodal Graph Learning (MGL), that leverages information from multiple modalities.
Our proposed method aims to effectively identify and utilize distinguishing features for deepfake detection.
- Score: 10.077496841634135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing deepfake detectors face several challenges in achieving robustness
and generalization. One of the primary reasons is their limited ability to
extract relevant information from forgery videos, especially in the presence of
various artifacts such as spatial, frequency, temporal, and landmark
mismatches. Current detectors rely on pixel-level features that are easily
affected by unknown disturbances or facial landmarks that do not provide
sufficient information. Furthermore, most detectors cannot utilize information
from multiple domains for detection, leading to limited effectiveness in
identifying deepfake videos. To address these limitations, we propose a novel
framework, namely Multimodal Graph Learning (MGL) that leverages information
from multiple modalities using two GNNs and several multimodal fusion modules.
At the frame level, we employ a bi-directional cross-modal transformer and an
adaptive gating mechanism to combine the features from the spatial and
frequency domains with the geometric-enhanced landmark features captured by a
GNN. At the video level, we use a Graph Attention Network (GAT) to represent
each frame in a video as a node in a graph and encode temporal information into
the edges of the graph to extract temporal inconsistency between frames. Our
proposed method aims to effectively identify and utilize distinguishing
features for deepfake detection. We evaluate the effectiveness of our method
through extensive experiments on widely-used benchmarks and demonstrate that
our method outperforms the state-of-the-art detectors in terms of
generalization ability and robustness against unknown disturbances.
Related papers
- Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization [3.9440964696313485]
In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity.
Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat.
We propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection.
arXiv Detail & Related papers (2024-08-02T18:45:01Z) - DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention [12.36906630199689]
We construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model.
Our goal is to utilize a hierarchical progressive network to capture forged artifacts at different scales for detection and localization.
arXiv Detail & Related papers (2024-06-03T16:13:33Z) - Frequency-Aware Deepfake Detection: Improving Generalizability through
Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images.
Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries.
We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z) - Beyond the Benchmark: Detecting Diverse Anomalies in Videos [0.6993026261767287]
Video Anomaly Detection (VAD) plays a crucial role in modern surveillance systems, aiming to identify various anomalies in real-world situations.
Current benchmark datasets predominantly emphasize simple, single-frame anomalies such as novel object detection.
We advocate for an expansion of VAD investigations to encompass intricate anomalies that extend beyond conventional benchmark boundaries.
arXiv Detail & Related papers (2023-10-03T09:22:06Z) - HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness [2.341385717236931]
We propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection.
Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies.
Our HiDAnet performs favorably over the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2023-01-18T10:00:59Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - Detection of Deepfake Videos Using Long Distance Attention [73.6659488380372]
Most existing detection methods treat the problem as a vanilla binary classification problem.
In this paper, the problem is treated as a special fine-grained classification problem since the differences between fake and real faces are very subtle.
A spatial-temporal model is proposed which has two components for capturing spatial and temporal forgery traces in global perspective.
arXiv Detail & Related papers (2021-06-24T08:33:32Z) - Learning Multi-Granular Hypergraphs for Video-Based Person
Re-Identification [110.52328716130022]
Video-based person re-identification (re-ID) is an important research topic in computer vision.
We propose a novel graph-based framework, namely Multi-Granular Hypergraph (MGH) to better representational capabilities.
90.0% top-1 accuracy on MARS is achieved using MGH, outperforming the state-of-the-arts schemes.
arXiv Detail & Related papers (2021-04-30T11:20:02Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Fake Visual Content Detection Using Two-Stream Convolutional Neural
Networks [14.781702606707642]
We propose a two-stream convolutional neural network architecture called TwoStreamNet to complement frequency and spatial domain features.
The proposed detector has demonstrated significant performance improvement compared to the current state-of-the-art fake content detectors.
arXiv Detail & Related papers (2021-01-03T18:05:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.