M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection
- URL: http://arxiv.org/abs/2104.09770v2
- Date: Wed, 21 Apr 2021 12:59:29 GMT
- Title: M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection
- Authors: Junke Wang, Zuxuan Wu, Jingjing Chen, and Yu-Gang Jiang
- Abstract summary: forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
- Score: 74.19291916812921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widespread dissemination of forged images generated by Deepfake
techniques has posed a serious threat to the trustworthiness of digital
information. This demands effective approaches that can detect perceptually
convincing Deepfakes generated by advanced manipulation techniques. Most
existing approaches combat Deepfakes with deep neural networks by mapping the
input image to a binary prediction without capturing the consistency among
different pixels. In this paper, we aim to capture the subtle manipulation
artifacts at different scales for Deepfake detection. We achieve this with
transformer models, which have recently demonstrated superior performance in
modeling dependencies between pixels for a variety of recognition tasks in
computer vision. In particular, we introduce a Multi-modal Multi-scale
TRansformer (M2TR), which uses a multi-scale transformer that operates on
patches of different sizes to detect the local inconsistency at different
spatial levels. To improve the detection results and enhance the robustness of
our method to image compression, M2TR also takes frequency information, which
is further combined with RGB features using a cross modality fusion module.
Developing and evaluating Deepfake detection methods requires large-scale
datasets. However, we observe that samples in existing benchmarks contain
severe artifacts and lack diversity. This motivates us to introduce a
high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos
generated by state-of-the-art face swapping and facial reenactment methods. On
three Deepfake datasets, we conduct extensive experiments to verify the
effectiveness of the proposed method, which outperforms state-of-the-art
Deepfake detection methods.
Related papers
- FSBI: Deepfakes Detection with Frequency Enhanced Self-Blended Images [17.707379977847026]
This paper introduces a Frequency Enhanced Self-Blended Images approach for deepfakes detection.
The proposed approach has been evaluated on FF++ and Celeb-DF datasets.
arXiv Detail & Related papers (2024-06-12T20:15:00Z) - Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images [13.089550724738436]
Diffusion models (DMs) have revolutionized image generation, producing high-quality images with applications spanning various fields.
Their ability to create hyper-realistic images poses significant challenges in distinguishing between real and synthetic content.
This work introduces a robust detection framework that integrates image and text features extracted by CLIP model with a Multilayer Perceptron (MLP) classifier.
arXiv Detail & Related papers (2024-04-19T14:30:41Z) - Generalized Deepfakes Detection with Reconstructed-Blended Images and
Multi-scale Feature Reconstruction Network [14.749857283918157]
We present a blended-based detection approach that has robust applicability to unseen datasets.
Experiments demonstrated that this approach results in better performance in both cross-manipulation detection and cross-dataset detection on unseen data.
arXiv Detail & Related papers (2023-12-13T09:49:15Z) - DeepFidelity: Perceptual Forgery Fidelity Assessment for Deepfake
Detection [67.3143177137102]
Deepfake detection refers to detecting artificially generated or edited faces in images or videos.
We propose a novel Deepfake detection framework named DeepFidelity to adaptively distinguish real and fake faces.
arXiv Detail & Related papers (2023-12-07T07:19:45Z) - CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF)
Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts.
It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z) - MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential
Deepfake Detection [81.59191603867586]
Sequential deepfake detection aims to identify forged facial regions with the correct sequence for recovery.
The recovery of forged images requires knowledge of the manipulation model to implement inverse transformations.
We propose Multi-Collaboration and Multi-Supervision Network (MMNet) that handles various spatial scales and sequential permutations in forged face images.
arXiv Detail & Related papers (2023-07-06T02:32:08Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.