UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision
Transformer for Face Forgery Detection
- URL: http://arxiv.org/abs/2210.12752v1
- Date: Sun, 23 Oct 2022 15:24:47 GMT
- Title: UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision
Transformer for Face Forgery Detection
- Authors: Wanyi Zhuang, Qi Chu, Zhentao Tan, Qiankun Liu, Haojie Yuan, Changtao
Miao, Zixiang Luo, Nenghai Yu
- Abstract summary: We propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT.
Due to the self-attention mechanism, the attention map among patch embeddings naturally represents the consistency relation, making the vision Transformer suitable for the consistency representation learning.
- Score: 52.91782218300844
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Intra-frame inconsistency has been proved to be effective for the
generalization of face forgery detection. However, learning to focus on these
inconsistency requires extra pixel-level forged location annotations. Acquiring
such annotations is non-trivial. Some existing methods generate large-scale
synthesized data with location annotations, which is only composed of real
images and cannot capture the properties of forgery regions. Others generate
forgery location labels by subtracting paired real and fake images, yet such
paired data is difficult to collected and the generated label is usually
discontinuous. To overcome these limitations, we propose a novel Unsupervised
Inconsistency-Aware method based on Vision Transformer, called UIA-ViT, which
only makes use of video-level labels and can learn inconsistency-aware feature
without pixel-level annotations. Due to the self-attention mechanism, the
attention map among patch embeddings naturally represents the consistency
relation, making the vision Transformer suitable for the consistency
representation learning. Based on vision Transformer, we propose two key
components: Unsupervised Patch Consistency Learning (UPCL) and Progressive
Consistency Weighted Assemble (PCWA). UPCL is designed for learning the
consistency-related representation with progressive optimized pseudo
annotations. PCWA enhances the final classification embedding with previous
patch embeddings optimized by UPCL to further improve the detection
performance. Extensive experiments demonstrate the effectiveness of the
proposed method.
Related papers
- DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models [7.649900082537232]
This study introduces DPA, an unsupervised domain adaptation method for vision-textual models.
It introduces the concept of dual prototypes, acting as distinct classifiers, along with the convex combination of their outputs.
It ranks pseudo-labels to facilitate robust self-training, particularly during early training.
Experiments on 13 downstream vision tasks demonstrate that DPA significantly outperforms zero-shot CLIP and the state-of-the-art unsupervised adaptation baselines.
arXiv Detail & Related papers (2024-08-16T17:30:27Z) - Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection [41.35861722481721]
Deepfake threats to society and cybersecurity have provoked significant public apprehension.
This paper introduces an elegantly simple yet effective strategy named Thumbnail Layout (TALL)
TALL transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies.
arXiv Detail & Related papers (2024-03-15T12:48:44Z) - MS-Former: Memory-Supported Transformer for Weakly Supervised Change
Detection with Patch-Level Annotations [50.79913333804232]
We propose a memory-supported transformer (MS-Former) for weakly supervised change detection.
MS-Former consists of a bi-directional attention block (BAB) and a patch-level supervision scheme (PSS)
Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method in the change detection task.
arXiv Detail & Related papers (2023-11-16T09:57:29Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - A Cross-Scale Hierarchical Transformer with Correspondence-Augmented
Attention for inferring Bird's-Eye-View Semantic Segmentation [13.013635162859108]
Inferring BEV semantic segmentation conditioned on multi-camera-view images is a popular scheme in the community as cheap devices and real-time processing.
We propose a novel cross-scale hierarchical Transformer with correspondence-augmented attention for semantic segmentation inferring.
Our method has state-of-the-art performance in inferring BEV semantic segmentation conditioned on multi-camera-view images.
arXiv Detail & Related papers (2023-04-07T13:52:47Z) - Uncertain Label Correction via Auxiliary Action Unit Graphs for Facial
Expression Recognition [46.99756911719854]
We achieve uncertain label correction of facial expressions using auxiliary action unit (AU) graphs, called ULC-AG.
Experiments show that our ULC-AG achieves 89.31% and 61.57% accuracy on RAF-DB and AffectNet datasets, respectively.
arXiv Detail & Related papers (2022-04-23T11:09:43Z) - Imposing Consistency for Optical Flow Estimation [73.53204596544472]
Imposing consistency through proxy tasks has been shown to enhance data-driven learning.
This paper introduces novel and effective consistency strategies for optical flow estimation.
arXiv Detail & Related papers (2022-04-14T22:58:30Z) - Exploring Feature Representation Learning for Semi-supervised Medical
Image Segmentation [30.608293915653558]
We present a two-stage framework for semi-supervised medical image segmentation.
Key insight is to explore the feature representation learning with labeled and unlabeled (i.e., pseudo labeled) images.
A stage-adaptive contrastive learning method is proposed, containing a boundary-aware contrastive loss.
We present an aleatoric uncertainty-aware method, namely AUA, to generate higher-quality pseudo labels.
arXiv Detail & Related papers (2021-11-22T05:06:12Z) - Self-supervised Equivariant Attention Mechanism for Weakly Supervised
Semantic Segmentation [93.83369981759996]
We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap.
Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation.
We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
arXiv Detail & Related papers (2020-04-09T14:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.