CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection
Abstract Overview
This paper studies AI-generated image detection under the combined constraints of generalization and efficiency. It introduces FakeForm, a benchmark with about 370,000 images across 62 domains, designed to evaluate both cross-model and cross-domain detection rather than only photorealistic cross-model transfer. The authors argue that synthetic images often exhibit more non-uniform color distributions than real photographs and formalize this with a Noise-Quantization Probe that measures stability under injected noise and color quantization. Based on this idea, they propose CoDA, a compact dual-branch detector that fuses probe-derived color cues with image features, and they provide a theoretical analysis linking probe responses to color-distribution irregularity.
Novelty
The work is distinctive in combining two contributions: a broad new benchmark for cross-domain AI-generated image detection and a lightweight detector built around color-distribution probing rather than only semantic or frequency cues. Its theoretical treatment of the Noise-Quantization Probe as a mechanism for exposing color non-uniformity is also presented as a principled explanation for why this cue can transfer across generator families.
Results
Across standard benchmarks, CoDA reports 98.2/99.6 Acc/AP on ForenSynths, 97.5/99.4 on the Ojha diffusion benchmark, and 95.9/99.1 on GenImage. On FakeForm, it achieves 91.0/93.0 mean Acc/AP in photorealistic cross-model evaluation and the best reported cross-domain mean of 77.7/88.1 across 62 domains. The detector is also compact and fast, using 1.48M parameters and running at 125.2 FPS, while maintaining strong robustness under common image perturbations.
Key Points
- FakeForm expands evaluation beyond photorealistic cross-model testing to 62 diverse domains and includes over 760,000 human judgments for perceptual analysis.
- CoDA uses a Noise-Quantization Probe to convert color-distribution irregularities into structured residual signals, then combines them with standard visual features in a lightweight dual-branch network.
- The reported gains are strongest in difficult cross-domain settings, though the paper also notes weaker performance in low-color or specialized domains such as sketch-like or technical imagery.