Related papers: Deepfake Detection Scheme Based on Vision Transformer and Distillation

Deepfake Detection Scheme Based on Vision Transformer and Distillation

URL: http://arxiv.org/abs/2104.01353v1
Date: Sat, 3 Apr 2021 09:13:05 GMT
Title: Deepfake Detection Scheme Based on Vision Transformer and Distillation
Authors: Young-Jin Heo, Young-Ju Choi, Young-Woon Lee, Byung-Gyu Kim
Abstract summary: We propose a Vision Transformer model with distillation methodology for detecting fake videos. We verify that the proposed scheme with patch embedding as input outperforms the state-of-the-art using the combined CNN features.
Score: 4.716110829725784
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deepfake is the manipulated video made with a generative deep learning technique such as Generative Adversarial Networks (GANs) or Auto Encoder that anyone can utilize. Recently, with the increase of Deepfake videos, some classifiers consisting of the convolutional neural network that can distinguish fake videos as well as deepfake datasets have been actively created. However, the previous studies based on the CNN structure have the problem of not only overfitting, but also considerable misjudging fake video as real ones. In this paper, we propose a Vision Transformer model with distillation methodology for detecting fake videos. We design that a CNN features and patch-based positioning model learns to interact with all positions to find the artifact region for solving false negative problem. Through comparative analysis on Deepfake Detection (DFDC) Dataset, we verify that the proposed scheme with patch embedding as input outperforms the state-of-the-art using the combined CNN features. Without ensemble technique, our model obtains 0.978 of AUC and 91.9 of f1 score, while previous SOTA model yields 0.972 of AUC and 90.6 of f1 score on the same condition.

Related papers

Unmasking Deep Fakes: Leveraging Deep Learning for Video Authenticity Detection [3.483595743063401]
The primary motivation of this paper is to recognize deepfake videos using deep learning techniques.<n>We consider using MTCNN as a face detector and EfficientNet-B5 as encoder model to predict if a video is deepfake or not.<n>The results show that our deepfake detection model acquired 42.78% log loss, 93.80% AUC and 86.82% F1 score on kaggle's DFDC dataset.
arXiv Detail & Related papers (2025-05-10T06:19:14Z)
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries. Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality. We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z)
Deepfake Video Detection Using Generative Convolutional Vision Transformer [3.8297637120486496]
We propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection. Our model combines ConvNeXt and Swin Transformer models for feature extraction. By learning from the visual artifacts and latent data distribution, GenConViT achieves improved performance in detecting a wide range of deepfake videos.
arXiv Detail & Related papers (2023-07-13T19:27:40Z)
Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z)
Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos. We propose to perform the deepfake detection from an unexplored voice-face matching view. Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z)
Model Attribution of Face-swap Deepfake Videos [39.771800841412414]
We first introduce a new dataset with DeepFakes from Different Models (DFDM) based on several Autoencoder models. Specifically, five generation models with variations in encoder, decoder, intermediate layer, input resolution, and compression ratio have been used to generate a total of 6,450 Deepfake videos. We take Deepfakes model attribution as a multiclass classification task and propose a spatial and temporal attention based method to explore the differences among Deepfakes.
arXiv Detail & Related papers (2022-02-25T20:05:18Z)
Combining EfficientNet and Vision Transformers for Video Deepfake Detection [6.365889364810238]
Deepfakes are the result of digital manipulation to obtain credible videos in order to deceive the viewer. In this study, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor. The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very close to the state-of-the-art on the DeepFake Detection Challenge (DFDC)
arXiv Detail & Related papers (2021-07-06T13:35:11Z)
Deepfake Video Detection Using Convolutional Vision Transformer [0.0]
Deep learning techniques can generate and synthesis hyper-realistic videos known as Deepfakes. Deepfakes pose a looming threat to everyone if used for harmful purposes such as identity theft, phishing, and scam. We propose a Convolutional Vision Transformer for the detection of Deepfakes.
arXiv Detail & Related papers (2021-02-22T15:56:05Z)
Adversarially robust deepfake media detection using fused convolutional neural network predictions [79.00202519223662]
Current deepfake detection systems struggle against unseen data. We employ three different deep Convolutional Neural Network (CNN) models to classify fake and real images extracted from videos. The proposed technique outperforms state-of-the-art models with 96.5% accuracy.
arXiv Detail & Related papers (2021-02-11T11:28:00Z)
Sharp Multiple Instance Learning for DeepFake Video Detection [54.12548421282696]
We introduce a new problem of partial face attack in DeepFake video, where only video-level labels are provided but not all the faces in the fake videos are manipulated. A sharp MIL (S-MIL) is proposed which builds direct mapping from instance embeddings to bag prediction. Experiments on FFPMS and widely used DFDC dataset verify that S-MIL is superior to other counterparts for partially attacked DeepFake video detection.
arXiv Detail & Related papers (2020-08-11T08:52:17Z)
Emotions Don't Lie: An Audio-Visual Deepfake Detection Method Using Affective Cues [75.1731999380562]
We present a learning-based method for detecting real and fake deepfake multimedia content. We extract and analyze the similarity between the two audio and visual modalities from within the same video. We compare our approach with several SOTA deepfake detection methods and report per-video AUC of 84.4% on the DFDC and 96.6% on the DF-TIMIT datasets.
arXiv Detail & Related papers (2020-03-14T22:07:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.