Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection
- URL: http://arxiv.org/abs/2312.16649v1
- Date: Wed, 27 Dec 2023 17:36:32 GMT
- Title: Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection
- Authors: Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Yao Zhao,
Jingdong Wang
- Abstract summary: We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
- Score: 106.39544368711427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of generalizable synthetic image
detection, aiming to detect forgery images from diverse generative methods,
e.g., GANs and diffusion models. Cutting-edge solutions start to explore the
benefits of pre-trained models, and mainly follow the fixed paradigm of solely
training an attached classifier, e.g., combining frozen CLIP-ViT with a
learnable linear layer in UniFD. However, our analysis shows that such a fixed
paradigm is prone to yield detectors with insufficient learning regarding
forgery representations. We attribute the key challenge to the lack of forgery
adaptation, and present a novel forgery-aware adaptive transformer approach,
namely FatFormer. Based on the pre-trained vision-language spaces of CLIP,
FatFormer introduces two core designs for the adaption to build generalized
forgery representations. First, motivated by the fact that both image and
frequency analysis are essential for synthetic image detection, we develop a
forgery-aware adapter to adapt image features to discern and integrate local
forgery traces within image and frequency domains. Second, we find that
considering the contrastive objectives between adapted image features and text
prompt embeddings, a previously overlooked aspect, results in a nontrivial
generalization improvement. Accordingly, we introduce language-guided alignment
to supervise the forgery adaptation with image and text prompts in FatFormer.
Experiments show that, by coupling these two designs, our approach tuned on
4-class ProGAN data attains a remarkable detection performance, achieving an
average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen
diffusion models with 95% accuracy.
Related papers
- Bi-LORA: A Vision-Language Approach for Synthetic Image Detection [14.448350657613364]
Deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs) have ushered in an era of generating highly realistic images.
This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs)
We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images.
arXiv Detail & Related papers (2024-04-02T13:54:22Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - A Dual Attentive Generative Adversarial Network for Remote Sensing Image
Change Detection [6.906936669510404]
We propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks.
The DAGAN framework has better performance with 85.01% mean IoU and 91.48% mean F1 score than advanced methods on the LEVIR dataset.
arXiv Detail & Related papers (2023-10-03T08:26:27Z) - Adaptive Input-image Normalization for Solving the Mode Collapse Problem in GAN-based X-ray Images [0.08192907805418582]
This work contributes an empirical demonstration of the benefits of integrating the adaptive input-image normalization with the Deep Conversaal GAN and Auxiliary GAN to alleviate the mode collapse problems.
Results demonstrate that the DCGAN and the ACGAN with adaptive input-image normalization outperform the DCGAN and ACGAN with un-normalized X-ray images.
arXiv Detail & Related papers (2023-09-21T16:43:29Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Generalizable Synthetic Image Detection via Language-guided Contrastive
Learning [22.4158195581231]
malevolent use of synthetic images, such as the dissemination of fake news or the creation of fake profiles, raises significant concerns regarding the authenticity of images.
We propose a simple yet very effective synthetic image detection method via a language-guided contrastive learning and a new formulation of the detection problem.
It is shown that our proposed LanguAge-guided SynThEsis Detection (LASTED) model achieves much improved generalizability to unseen image generation models.
arXiv Detail & Related papers (2023-05-23T08:13:27Z) - Discrepancy-Guided Reconstruction Learning for Image Forgery Detection [10.221066530624373]
We first propose a Discrepancy-Guided (DisGE) to extract forgery-sensitive visual patterns.
We then introduce a Double-Head Reconstruction (DouHR) module to enhance genuine compact visual patterns in different granular spaces.
Under DouHR, we further introduce a Discrepancy-Aggregation Detector (DisAD) to aggregate these genuine compact visual patterns.
arXiv Detail & Related papers (2023-04-26T07:40:43Z) - FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images.
FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales.
In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Towards Unsupervised Deep Image Enhancement with Generative Adversarial
Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN)
It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner.
Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.