Related papers: Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection

Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection

URL: http://arxiv.org/abs/2312.16649v1
Date: Wed, 27 Dec 2023 17:36:32 GMT
Title: Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Authors: Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Yao Zhao, Jingdong Wang
Abstract summary: We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods. We present a novel forgery-aware adaptive transformer approach, namely FatFormer. Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
Score: 106.39544368711427
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods, e.g., GANs and diffusion models. Cutting-edge solutions start to explore the benefits of pre-trained models, and mainly follow the fixed paradigm of solely training an attached classifier, e.g., combining frozen CLIP-ViT with a learnable linear layer in UniFD. However, our analysis shows that such a fixed paradigm is prone to yield detectors with insufficient learning regarding forgery representations. We attribute the key challenge to the lack of forgery adaptation, and present a novel forgery-aware adaptive transformer approach, namely FatFormer. Based on the pre-trained vision-language spaces of CLIP, FatFormer introduces two core designs for the adaption to build generalized forgery representations. First, motivated by the fact that both image and frequency analysis are essential for synthetic image detection, we develop a forgery-aware adapter to adapt image features to discern and integrate local forgery traces within image and frequency domains. Second, we find that considering the contrastive objectives between adapted image features and text prompt embeddings, a previously overlooked aspect, results in a nontrivial generalization improvement. Accordingly, we introduce language-guided alignment to supervise the forgery adaptation with image and text prompts in FatFormer. Experiments show that, by coupling these two designs, our approach tuned on 4-class ProGAN data attains a remarkable detection performance, achieving an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.

Related papers

Towards Generalizable AI-Generated Image Detection via Image-Adaptive Prompt Learning [30.415427474641813]
We propose a novel framework named Image-Adaptive Prompt Learning (IAPL), which enhances flexibility in processing diverse testing images.<n>It consists of two adaptive modules, i.e., the Conditional Information Learner and the Confidence-Driven Adaptive Prediction.<n>Experiments show that IAPL achieves state-of-the-art performance, with 95.61% and 96.7% mean accuracy on two widely used UniversalFakeDetect and GenImage datasets.
arXiv Detail & Related papers (2025-08-03T05:41:24Z)
Cross-Subject Mind Decoding from Inaccurate Representations [42.19569985029642]
We propose a Bi Autoencoder Intertwining framework for accurate decoded representation prediction.<n>Our method outperforms state-of-the-art approaches on benchmark datasets in both qualitative and quantitative evaluations.
arXiv Detail & Related papers (2025-07-25T08:45:02Z)
FakeReasoning: Towards Generalizable Forgery Detection and Reasoning [24.8865218866598]
We propose modeling AI-generated image detection and explanation as a Forgery Detection and Reasoning task (FDR-Task) We introduce the Multi-Modal Forgery Reasoning dataset (MMFR-Dataset), a large-scale dataset containing 100K images across 10 generative models. We also propose FakeReasoning, a forgery detection and reasoning framework with two key components.
arXiv Detail & Related papers (2025-03-27T06:54:06Z)
DiffDoctor: Diagnosing Image Diffusion Models Before Treating [57.82359018425674]
We propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts. We collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process. The learned artifact detector is then involved in the second stage to optimize the diffusion model by providing pixel-level feedback.
arXiv Detail & Related papers (2025-01-21T18:56:41Z)
Generalizable Origin Identification for Text-Guided Image-to-Image Diffusion Models [39.234894330025114]
Text-guided image-to-image diffusion models excel in translating images based on textual prompts. This motivates us to introduce the task of origin IDentification for text-guided Image-to-image Diffusion models (ID$2$) A straightforward solution to ID$2$ involves training a specialized deep embedding model to extract and compare features from both query and reference images.
arXiv Detail & Related papers (2025-01-04T20:34:53Z)
Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms [27.882122236282054]
We present a novel method for scene change detection that leverages the robust feature extraction capabilities of a visual foundational model, DINOv2. We evaluate our approach on two benchmark datasets, VL-CMU-CD and PSCD, along with their viewpoint-varied versions. Our experiments demonstrate significant improvements in F1-score, particularly in scenarios involving geometric changes between image pairs.
arXiv Detail & Related papers (2024-09-25T11:55:27Z)
Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection. We design a forgery-style mixture formulation that augments the diversity of forgery source domains. We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z)
Bi-LORA: A Vision-Language Approach for Synthetic Image Detection [14.448350657613364]
Deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs) have ushered in an era of generating highly realistic images. This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs) We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images.
arXiv Detail & Related papers (2024-04-02T13:54:22Z)
Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions. We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z)
A Dual Attentive Generative Adversarial Network for Remote Sensing Image Change Detection [6.906936669510404]
We propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks. The DAGAN framework has better performance with 85.01% mean IoU and 91.48% mean F1 score than advanced methods on the LEVIR dataset.
arXiv Detail & Related papers (2023-10-03T08:26:27Z)
Adaptive Input-image Normalization for Solving the Mode Collapse Problem in GAN-based X-ray Images [0.08192907805418582]
This work contributes an empirical demonstration of the benefits of integrating the adaptive input-image normalization with the Deep Conversaal GAN and Auxiliary GAN to alleviate the mode collapse problems. Results demonstrate that the DCGAN and the ACGAN with adaptive input-image normalization outperform the DCGAN and ACGAN with un-normalized X-ray images.
arXiv Detail & Related papers (2023-09-21T16:43:29Z)
Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm. FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Discrepancy-Guided Reconstruction Learning for Image Forgery Detection [10.221066530624373]
We first propose a Discrepancy-Guided (DisGE) to extract forgery-sensitive visual patterns. We then introduce a Double-Head Reconstruction (DouHR) module to enhance genuine compact visual patterns in different granular spaces. Under DouHR, we further introduce a Discrepancy-Aggregation Detector (DisAD) to aggregate these genuine compact visual patterns.
arXiv Detail & Related papers (2023-04-26T07:40:43Z)
Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN) It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner. Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.