Self-Supervised Transformer with Domain Adaptive Reconstruction for
General Face Forgery Video Detection
- URL: http://arxiv.org/abs/2309.04795v1
- Date: Sat, 9 Sep 2023 13:40:44 GMT
- Title: Self-Supervised Transformer with Domain Adaptive Reconstruction for
General Face Forgery Video Detection
- Authors: Daichi Zhang, Zihao Xiao, Jianmin Li, Shiming Ge
- Abstract summary: A Self-supervised Transformer cooperating with Contrastive and Reconstruction learning (CoReST) is proposed.
Two specific auxiliary tasks incorporated contrastive and reconstruction learning are designed to enhance the representation learning.
Our proposed method performs even better than the state-of-the-art supervised competitors with impressive generalization.
- Score: 24.619102747582456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face forgery videos have caused severe social public concern, and various
detectors have been proposed recently. However, most of them are trained in a
supervised manner with limited generalization when detecting videos from
different forgery methods or real source videos. To tackle this issue, we
explore to take full advantage of the difference between real and forgery
videos by only exploring the common representation of real face videos. In this
paper, a Self-supervised Transformer cooperating with Contrastive and
Reconstruction learning (CoReST) is proposed, which is first pre-trained only
on real face videos in a self-supervised manner, and then fine-tuned a linear
head on specific face forgery video datasets. Two specific auxiliary tasks
incorporated contrastive and reconstruction learning are designed to enhance
the representation learning. Furthermore, a Domain Adaptive Reconstruction
(DAR) module is introduced to bridge the gap between different forgery domains
by reconstructing on unlabeled target videos when fine-tuning. Extensive
experiments on public datasets demonstrate that our proposed method performs
even better than the state-of-the-art supervised competitors with impressive
generalization.
Related papers
- Multi-Contextual Predictions with Vision Transformer for Video Anomaly
Detection [22.098399083491937]
understanding of thetemporal context of a video plays a vital role in anomaly detection.
We design a transformer model with three different contextual prediction streams: masked, whole and partial.
By learning to predict the missing frames of consecutive normal frames, our model can effectively learn various normality patterns in the video.
arXiv Detail & Related papers (2022-06-17T05:54:31Z) - Self-supervised Video-centralised Transformer for Video Face Clustering [58.12996668434134]
This paper presents a novel method for face clustering in videos using a video-centralised transformer.
We release the first large-scale egocentric video face clustering dataset named EasyCom-Clustering.
arXiv Detail & Related papers (2022-03-24T16:38:54Z) - Leveraging Real Talking Faces via Self-Supervision for Robust Forgery
Detection [112.96004727646115]
We develop a method to detect face-manipulated videos using real talking faces.
We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments.
Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.
arXiv Detail & Related papers (2022-01-18T17:14:54Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - A Video Is Worth Three Views: Trigeminal Transformers for Video-based
Person Re-identification [77.08204941207985]
Video-based person re-identification (Re-ID) aims to retrieve video sequences of the same person under non-overlapping cameras.
We propose a novel framework named Trigeminal Transformers (TMT) for video-based person Re-ID.
arXiv Detail & Related papers (2021-04-05T02:50:16Z) - Overcomplete Representations Against Adversarial Videos [72.04912755926524]
We propose a novel Over-and-Under complete restoration network for Defending against adversarial videos (OUDefend)
OUDefend is designed to balance local and global features by learning those two representations.
Experimental results show that the defenses focusing on images may be ineffective to videos, while OUDefend enhances robustness against different types of adversarial videos.
arXiv Detail & Related papers (2020-12-08T08:00:17Z) - ID-Reveal: Identity-aware DeepFake Video Detection [24.79483180234883]
ID-Reveal is a new approach that learns temporal facial features, specific of how a person moves while talking.
We do not need any training data of fakes, but only train on real videos.
We obtain an average improvement of more than 15% in terms of accuracy for facial reenactment on high compressed videos.
arXiv Detail & Related papers (2020-12-04T10:43:16Z) - Red Carpet to Fight Club: Partially-supervised Domain Transfer for Face
Recognition in Violent Videos [12.534785814117065]
We introduce the WildestFaces dataset to study cross-domain recognition under a variety of adverse conditions.
We establish a rigorous evaluation protocol for this clean-to-violent recognition task, and present a detailed analysis of the proposed dataset and the methods.
arXiv Detail & Related papers (2020-09-16T09:45:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.