Related papers: Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets

Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets

URL: http://arxiv.org/abs/2403.17608v2
Date: Thu, 28 Mar 2024 15:24:16 GMT
Title: Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets
Authors: Patrick Grommelt, Louis Weiss, Franz-Josef Pfreundt, Janis Keuper,
Abstract summary: Many datasets for AI-generated image detection contain biases related to JPEG compression and image size. We demonstrate that detectors indeed learn from these undesired factors. It leads to more than 11 percentage points increase in cross-generator performance for ResNet50 and Swin-T detectors.
Score: 6.554757265434464
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The widespread adoption of generative image models has highlighted the urgent need to detect artificial content, which is a crucial step in combating widespread manipulation and misinformation. Consequently, numerous detectors and associated datasets have emerged. However, many of these datasets inadvertently introduce undesirable biases, thereby impacting the effectiveness and evaluation of detectors. In this paper, we emphasize that many datasets for AI-generated image detection contain biases related to JPEG compression and image size. Using the GenImage dataset, we demonstrate that detectors indeed learn from these undesired factors. Furthermore, we show that removing the named biases substantially increases robustness to JPEG compression and significantly alters the cross-generator performance of evaluated detectors. Specifically, it leads to more than 11 percentage points increase in cross-generator performance for ResNet50 and Swin-T detectors on the GenImage dataset, achieving state-of-the-art results. We provide the dataset and source codes of this paper on the anonymous website: https://www.unbiased-genimage.org

Related papers

RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors [57.81012948133832]
We present RAID (Robust evaluation of AI-generated image Detectors), a dataset of 72k diverse and highly transferable adversarial examples.<n>Our methodology generates adversarial images that transfer with a high success rate to unseen detectors.<n>Our findings indicate that current state-of-the-art AI-generated image detectors can be easily deceived by adversarial examples.
arXiv Detail & Related papers (2025-06-04T14:16:00Z)
Few-Shot Learner Generalizes Across AI-Generated Image Detection [14.069833211684715]
Few-Shot Detector (FSD) is a novel AI-generated image detector which learns a specialized metric space for effectively distinguishing unseen fake images.<n>Experiments show FSD achieves state-of-the-art performance by $+11.6%$ average accuracy on the GenImage dataset with only $10$ additional samples.
arXiv Detail & Related papers (2025-01-15T12:33:11Z)
A Bias-Free Training Paradigm for More General AI-generated Image Detection [15.421102443599773]
A well-designed forensic detector should detect generator specific artifacts rather than reflect data biases. We propose B-Free, a bias-free training paradigm, where fake images are generated from real ones. We show significant improvements in both generalization and robustness over state-of-the-art detectors.
arXiv Detail & Related papers (2024-12-23T15:54:32Z)
Is JPEG AI going to change image forensics? [50.92778618091496]
We investigate the counter-forensic effects of the new JPEG AI standard based on neural image compression. Our results demonstrate a reduction in the performance of leading forensic detectors when analyzing content processed through JPEG AI.
arXiv Detail & Related papers (2024-12-04T12:07:20Z)
Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models. In this paper, we investigate how detection performance varies across model backbones, types, and datasets. We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors [62.63467652611788]
We introduce SEMI-TRUTHS, featuring 27,600 real images, 223,400 masks, and 1,472,700 AI-augmented images. Each augmented image is accompanied by metadata for standardized and targeted evaluation of detector robustness. Our findings suggest that state-of-the-art detectors exhibit varying sensitivities to the types and degrees of perturbations, data distributions, and augmentation methods used.
arXiv Detail & Related papers (2024-11-12T01:17:27Z)
Zero-Shot Detection of AI-Generated Images [54.01282123570917]
We propose a zero-shot entropy-based detector (ZED) to detect AI-generated images. Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images. ZED achieves an average improvement of more than 3% over the SoTA in terms of accuracy.
arXiv Detail & Related papers (2024-09-24T08:46:13Z)
Improving Interpretability and Robustness for the Detection of AI-Generated Images [6.116075037154215]
We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings. We show how to interpret them, shedding light on how images produced by various AI generators differ from real ones.
arXiv Detail & Related papers (2024-06-21T10:33:09Z)
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable. Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology. We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z)
GenDet: Towards Good Generalizations for AI-Generated Image Detection [27.899521298845357]
Existing methods can effectively detect images generated by seen generators, but it is challenging to detect those generated by unseen generators. This paper addresses the unseen-generator detection problem by considering this task from the perspective of anomaly detection. Our method encourages smaller output discrepancies between the student and the teacher models for real images while aiming for larger discrepancies for fake images.
arXiv Detail & Related papers (2023-12-12T11:20:45Z)
Exposing Image Splicing Traces in Scientific Publications via Uncertainty-guided Refinement [30.698359275889363]
A surge in scientific publications suspected of image manipulation has led to numerous retractions. Image splicing detection is more challenging due to the lack of reference images and the typically small tampered areas. We propose an Uncertainty-guided Refinement Network (URN) to mitigate the impact of disruptive factors.
arXiv Detail & Related papers (2023-09-28T12:36:12Z)
GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image [28.38575401686718]
We introduce the GenImage dataset, which includes over one million pairs of AI-generated fake images and collected real images. The advantages allow the detectors trained on GenImage to undergo a thorough evaluation and demonstrate strong applicability to diverse images. We conduct a comprehensive analysis of the dataset and propose two tasks for evaluating the detection method in resembling real-world scenarios.
arXiv Detail & Related papers (2023-06-14T15:21:09Z)
Revisiting Consistency Regularization for Semi-supervised Change Detection in Remote Sensing Images [60.89777029184023]
We propose a semi-supervised CD model in which we formulate an unsupervised CD loss in addition to the supervised Cross-Entropy (CE) loss. Experiments conducted on two publicly available CD datasets show that the proposed semi-supervised CD method can reach closer to the performance of supervised CD.
arXiv Detail & Related papers (2022-04-18T17:59:01Z)
Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis [69.09526348527203]
Deep generative models have led to highly realistic media, known as deepfakes, that are commonly indistinguishable from real to human eyes. We propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection. We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios.
arXiv Detail & Related papers (2021-05-29T21:22:24Z)
Robust Data Hiding Using Inverse Gradient Attention [82.73143630466629]
In the data hiding task, each pixel of cover images should be treated differently since they have divergent tolerabilities. We propose a novel deep data hiding scheme with Inverse Gradient Attention (IGA), combing the ideas of adversarial learning and attention mechanism. Empirically, extensive experiments show that the proposed model outperforms the state-of-the-art methods on two prevalent datasets.
arXiv Detail & Related papers (2020-11-21T19:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.