Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection
- URL: http://arxiv.org/abs/2312.16649v1
- Date: Wed, 27 Dec 2023 17:36:32 GMT
- Title: Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection
- Authors: Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Yao Zhao,
Jingdong Wang
- Abstract summary: We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
- Score: 106.39544368711427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of generalizable synthetic image
detection, aiming to detect forgery images from diverse generative methods,
e.g., GANs and diffusion models. Cutting-edge solutions start to explore the
benefits of pre-trained models, and mainly follow the fixed paradigm of solely
training an attached classifier, e.g., combining frozen CLIP-ViT with a
learnable linear layer in UniFD. However, our analysis shows that such a fixed
paradigm is prone to yield detectors with insufficient learning regarding
forgery representations. We attribute the key challenge to the lack of forgery
adaptation, and present a novel forgery-aware adaptive transformer approach,
namely FatFormer. Based on the pre-trained vision-language spaces of CLIP,
FatFormer introduces two core designs for the adaption to build generalized
forgery representations. First, motivated by the fact that both image and
frequency analysis are essential for synthetic image detection, we develop a
forgery-aware adapter to adapt image features to discern and integrate local
forgery traces within image and frequency domains. Second, we find that
considering the contrastive objectives between adapted image features and text
prompt embeddings, a previously overlooked aspect, results in a nontrivial
generalization improvement. Accordingly, we introduce language-guided alignment
to supervise the forgery adaptation with image and text prompts in FatFormer.
Experiments show that, by coupling these two designs, our approach tuned on
4-class ProGAN data attains a remarkable detection performance, achieving an
average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen
diffusion models with 95% accuracy.
Related papers
- Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms [27.882122236282054]
We present a novel method for scene change detection that leverages the robust feature extraction capabilities of a visual foundational model, DINOv2.
We evaluate our approach on two benchmark datasets, VL-CMU-CD and PSCD, along with their viewpoint-varied versions.
Our experiments demonstrate significant improvements in F1-score, particularly in scenarios involving geometric changes between image pairs.
arXiv Detail & Related papers (2024-09-25T11:55:27Z) - Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - Bi-LORA: A Vision-Language Approach for Synthetic Image Detection [14.448350657613364]
Deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs) have ushered in an era of generating highly realistic images.
This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs)
We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images.
arXiv Detail & Related papers (2024-04-02T13:54:22Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - A Dual Attentive Generative Adversarial Network for Remote Sensing Image
Change Detection [6.906936669510404]
We propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks.
The DAGAN framework has better performance with 85.01% mean IoU and 91.48% mean F1 score than advanced methods on the LEVIR dataset.
arXiv Detail & Related papers (2023-10-03T08:26:27Z) - Adaptive Input-image Normalization for Solving the Mode Collapse Problem in GAN-based X-ray Images [0.08192907805418582]
This work contributes an empirical demonstration of the benefits of integrating the adaptive input-image normalization with the Deep Conversaal GAN and Auxiliary GAN to alleviate the mode collapse problems.
Results demonstrate that the DCGAN and the ACGAN with adaptive input-image normalization outperform the DCGAN and ACGAN with un-normalized X-ray images.
arXiv Detail & Related papers (2023-09-21T16:43:29Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Discrepancy-Guided Reconstruction Learning for Image Forgery Detection [10.221066530624373]
We first propose a Discrepancy-Guided (DisGE) to extract forgery-sensitive visual patterns.
We then introduce a Double-Head Reconstruction (DouHR) module to enhance genuine compact visual patterns in different granular spaces.
Under DouHR, we further introduce a Discrepancy-Aggregation Detector (DisAD) to aggregate these genuine compact visual patterns.
arXiv Detail & Related papers (2023-04-26T07:40:43Z) - Towards Unsupervised Deep Image Enhancement with Generative Adversarial
Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN)
It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner.
Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.