Sharp Multiple Instance Learning for DeepFake Video Detection
- URL: http://arxiv.org/abs/2008.04585v1
- Date: Tue, 11 Aug 2020 08:52:17 GMT
- Title: Sharp Multiple Instance Learning for DeepFake Video Detection
- Authors: Xiaodan Li, Yining Lang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Shuhui
Wang, Hui Xue, Quan Lu
- Abstract summary: We introduce a new problem of partial face attack in DeepFake video, where only video-level labels are provided but not all the faces in the fake videos are manipulated.
A sharp MIL (S-MIL) is proposed which builds direct mapping from instance embeddings to bag prediction.
Experiments on FFPMS and widely used DFDC dataset verify that S-MIL is superior to other counterparts for partially attacked DeepFake video detection.
- Score: 54.12548421282696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of facial manipulation techniques, face forgery
has received considerable attention in multimedia and computer vision community
due to security concerns. Existing methods are mostly designed for single-frame
detection trained with precise image-level labels or for video-level prediction
by only modeling the inter-frame inconsistency, leaving potential high risks
for DeepFake attackers. In this paper, we introduce a new problem of partial
face attack in DeepFake video, where only video-level labels are provided but
not all the faces in the fake videos are manipulated. We address this problem
by multiple instance learning framework, treating faces and input video as
instances and bag respectively. A sharp MIL (S-MIL) is proposed which builds
direct mapping from instance embeddings to bag prediction, rather than from
instance embeddings to instance prediction and then to bag prediction in
traditional MIL. Theoretical analysis proves that the gradient vanishing in
traditional MIL is relieved in S-MIL. To generate instances that can accurately
incorporate the partially manipulated faces, spatial-temporal encoded instance
is designed to fully model the intra-frame and inter-frame inconsistency, which
further helps to promote the detection performance. We also construct a new
dataset FFPMS for partially attacked DeepFake video detection, which can
benefit the evaluation of different methods at both frame and video levels.
Experiments on FFPMS and the widely used DFDC dataset verify that S-MIL is
superior to other counterparts for partially attacked DeepFake video detection.
In addition, S-MIL can also be adapted to traditional DeepFake image detection
tasks and achieve state-of-the-art performance on single-frame datasets.
Related papers
- Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud.
In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video.
We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z) - GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection [7.591187423217017]
This paper introduces a novel method for robust DeepFake video detection based on graph convolutional network with graph Laplacian.
The proposed method delivers state-of-the-art performance in DeepFake video detection under noisy face sequences.
arXiv Detail & Related papers (2024-06-28T14:17:16Z) - Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection [41.35861722481721]
Deepfake threats to society and cybersecurity have provoked significant public apprehension.
This paper introduces an elegantly simple yet effective strategy named Thumbnail Layout (TALL)
TALL transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies.
arXiv Detail & Related papers (2024-03-15T12:48:44Z) - Mask Propagation for Efficient Video Semantic Segmentation [63.09523058489429]
Video Semantic baseline degradation (VSS) involves assigning a semantic label to each pixel in a video sequence.
We propose an efficient mask propagation framework for VSS, called SSSS.
Our framework reduces up to 4x FLOPs compared to the per-frame Mask2Former with only up to 2% mIoU on the Cityscapes validation set.
arXiv Detail & Related papers (2023-10-29T09:55:28Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Two-branch Recurrent Network for Isolating Deepfakes in Videos [17.59209853264258]
We present a method for deepfake detection based on a two-branch network structure.
One branch propagates the original information, while the other branch suppresses the face content.
Our two novel components show promising results on the FaceForensics++, Celeb-DF, and Facebook's DFDC preview benchmarks.
arXiv Detail & Related papers (2020-08-08T01:38:56Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.