Exploring Spatial-Temporal Features for Deepfake Detection and
Localization
- URL: http://arxiv.org/abs/2210.15872v1
- Date: Fri, 28 Oct 2022 03:38:49 GMT
- Title: Exploring Spatial-Temporal Features for Deepfake Detection and
Localization
- Authors: Wu Haiwei and Zhou Jiantao and Zhang Shile and Tian Jinyu
- Abstract summary: We propose a Deepfake network that simultaneously explores spatial and temporal features for detecting and localizing forged regions.
Specifically, we design a new Anchor-Mesh Motion (AMM) algorithm to extract temporal (motion) features by modeling the precise geometric movements of the facial micro-expression.
The superiority of our ST-DDL network is verified by experimental comparisons with several state-of-the-art competitors.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the continuous research on Deepfake forensics, recent studies have
attempted to provide the fine-grained localization of forgeries, in addition to
the coarse classification at the video-level. However, the detection and
localization performance of existing Deepfake forensic methods still have
plenty of room for further improvement. In this work, we propose a
Spatial-Temporal Deepfake Detection and Localization (ST-DDL) network that
simultaneously explores spatial and temporal features for detecting and
localizing forged regions. Specifically, we design a new Anchor-Mesh Motion
(AMM) algorithm to extract temporal (motion) features by modeling the precise
geometric movements of the facial micro-expression. Compared with traditional
motion extraction methods (e.g., optical flow) designed to simulate
large-moving objects, our proposed AMM could better capture the
small-displacement facial features. The temporal features and the spatial
features are then fused in a Fusion Attention (FA) module based on a
Transformer architecture for the eventual Deepfake forensic tasks. The
superiority of our ST-DDL network is verified by experimental comparisons with
several state-of-the-art competitors, in terms of both video- and pixel-level
detection and localization performance. Furthermore, to impel the future
development of Deepfake forensics, we build a public forgery dataset consisting
of 6000 videos, with many new features such as using widely-used commercial
software (e.g., After Effects) for the production, providing online social
networks transmitted versions, and splicing multi-source videos. The source
code and dataset are available at https://github.com/HighwayWu/ST-DDL.
Related papers
- DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization [13.840950434728533]
We present a novel audio-visual deepfake detection framework.
Based on the assumption that in real samples - in contrast to deepfakes - visual and audio signals coincide in terms of information.
We use features from deep networks that specialize in video and audio speech recognition to spot frame-level cross-modal incongruities.
arXiv Detail & Related papers (2024-11-15T13:47:33Z) - Glitch in the Matrix: A Large Scale Benchmark for Content Driven
Audio-Visual Forgery Detection and Localization [20.46053083071752]
We propose and benchmark a new dataset, Localized Visual DeepFake (LAV-DF)
LAV-DF consists of strategic content-driven audio, visual and audio-visual manipulations.
The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture.
arXiv Detail & Related papers (2023-05-03T08:48:45Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Delving into Sequential Patches for Deepfake Detection [64.19468088546743]
Recent advances in face forgery techniques produce nearly untraceable deepfake videos, which could be leveraged with malicious intentions.
Previous studies has identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods.
We propose the Local- & Temporal-aware Transformer-based Deepfake Detection framework, which adopts a local-to-global learning protocol.
arXiv Detail & Related papers (2022-07-06T16:46:30Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Dense Multiscale Feature Fusion Pyramid Networks for Object Detection in
UAV-Captured Images [0.09065034043031667]
We propose a novel method called Dense Multiscale Feature Fusion Pyramid Networks(DMFFPN), which is aimed at obtaining rich features as much as possible.
Specifically, the dense connection is designed to fully utilize the representation from the different convolutional layers.
Experiments on the drone-based datasets named VisDrone-DET suggest a competitive performance of our method.
arXiv Detail & Related papers (2020-12-19T10:05:31Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Spatio-temporal Features for Generalized Detection of Deepfake Videos [12.453288832098314]
We propose-temporal features, modeled by 3D CNNs, to extend the capabilities to detect new sorts of deep videos.
We show that our approach outperforms existing methods in terms of generalization capabilities.
arXiv Detail & Related papers (2020-10-22T16:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.