Semantic-Aware Reconstruction Error for Detecting AI-Generated Images
- URL: http://arxiv.org/abs/2508.09487v2
- Date: Thu, 25 Sep 2025 06:29:54 GMT
- Title: Semantic-Aware Reconstruction Error for Detecting AI-Generated Images
- Authors: Ju Yeon Kang, Jaehong Park, Semin Kim, Ji Won Yoon, Nam Soo Kim,
- Abstract summary: We propose a novel representation, namely Semantic-Aware Reconstruction Error (SARE), that measures the semantic difference between an image and its caption-guided reconstruction.<n>SARE provides a robust and discriminative feature for detecting fake images across diverse generative models.<n>We also introduce a fusion module that integrates SARE into the backbone detector via a cross-attention mechanism.
- Score: 22.83053631078616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, AI-generated image detection has gained increasing attention, as the rapid advancement of image generation technologies has raised serious concerns about their potential misuse. While existing detection methods have achieved promising results, their performance often degrades significantly when facing fake images from unseen, out-of-distribution (OOD) generative models, since they primarily rely on model-specific artifacts and thus overfit to the models used for training. To address this limitation, we propose a novel representation, namely Semantic-Aware Reconstruction Error (SARE), that measures the semantic difference between an image and its caption-guided reconstruction. The key hypothesis behind SARE is that real images, whose captions often fail to fully capture their complex visual content, may undergo noticeable semantic shifts during the caption-guided reconstruction process. In contrast, fake images, which closely align with their captions, show minimal semantic changes. By quantifying these semantic shifts, SARE provides a robust and discriminative feature for detecting fake images across diverse generative models. Additionally, we introduce a fusion module that integrates SARE into the backbone detector via a cross-attention mechanism. Image features attend to semantic representations extracted from SARE, enabling the model to adaptively leverage semantic information. Experimental results demonstrate that the proposed method achieves strong generalization, outperforming existing baselines on benchmarks including GenImage and ForenSynths. We further validate the effectiveness of caption guidance through a detailed analysis of semantic shifts, confirming its ability to enhance detection robustness.
Related papers
- Detecting AI-Generated Images via Distributional Deviations from Real Images [6.615773227400183]
We propose a Masking-based Pre-trained model Fine-Tuning (MPFT) strategy, which introduces a Texture-Aware Masking (TAM) mechanism to mask textured areas containing generative model-specific patterns during fine-tuning.<n>Our method, fine-tuned with only a minimal number of images, significantly outperforms existing approaches, achieving up to 98.2% and 94.6% average accuracy on the two datasets, respectively.
arXiv Detail & Related papers (2026-01-07T05:00:13Z) - Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective [80.10217707456046]
We introduce a self-supervised approach for detecting AI-generated images that leverages camera metadata.<n>We train a feature extractor solely on camera-captured photographs by classifying categorical EXIF tags.<n>Our detectors deliver strong generalization to in-the-wild samples and robustness to common benign image perturbations.
arXiv Detail & Related papers (2025-12-05T11:53:18Z) - INSIGHT: An Interpretable Neural Vision-Language Framework for Reasoning of Generative Artifacts [0.0]
Current forensic systems degrade sharply under real-world conditions.<n>Most detectors operate as opaques, offering little insight into why an image is flagged as synthetic.<n>We introduce INSIGHT, a unified framework for robust detection and transparent explanation of AI-generated images.
arXiv Detail & Related papers (2025-11-27T11:43:50Z) - Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images [96.43608872116347]
AnomReason is a large-scale benchmark with structured annotations as quadruple textbfAnomAgent<n>AnomReason and AnomAgent serve as a foundation for measuring and improving the semantic plausibility of AI-generated images.
arXiv Detail & Related papers (2025-10-11T14:09:24Z) - $\f{D^3}$QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection [85.9202830503973]
Visual autoregressive (AR) models generate images through discrete token prediction.<n>We propose to leverage Discrete Distribution Discrepancy-aware Quantization Error (D$3$QE) for autoregressive-generated image detection.
arXiv Detail & Related papers (2025-10-07T13:02:27Z) - ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models [82.04858317800097]
We present ForenX, a novel method that not only identifies the authenticity of images but also provides explanations that resonate with human thoughts.<n>ForenX employs the powerful multimodal large language models (MLLMs) to analyze and interpret forensic cues.<n>We introduce ForgReason, a dataset dedicated to descriptions of forgery evidences in AI-generated images.
arXiv Detail & Related papers (2025-08-02T15:21:26Z) - NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection [14.7077339945096]
NS-Net is a novel framework that decouples semantic information from CLIP's visual features, followed by contrastive learning to capture intrinsic distributional differences between real and generated images.<n>Experiments show that NS-Net outperforms existing state-of-the-art methods, achieving a 7.4% improvement in detection accuracy.
arXiv Detail & Related papers (2025-08-02T07:58:15Z) - LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection [11.700935740718675]
LATTE - Latent Trajectory Embedding - is a novel approach that models the evolution of latent embeddings across several denoising timesteps.<n>By modeling the trajectory of such embeddings rather than single-step errors, LATTE captures subtle, discriminative patterns that distinguish real from generated images.
arXiv Detail & Related papers (2025-07-03T12:53:47Z) - A Watermark for Auto-Regressive Image Generation Models [50.599325258178254]
We propose C-reweight, a distortion-free watermarking method explicitly designed for image generation models.<n>C-reweight mitigates retokenization mismatch while preserving image fidelity.
arXiv Detail & Related papers (2025-06-13T00:15:54Z) - Explainable Synthetic Image Detection through Diffusion Timestep Ensembling [30.298198387824275]
We propose a novel synthetic image detection method that directly utilizes features of intermediately noised images by training an ensemble on multiple noised timesteps.<n>To enhance human comprehension, we introduce a metric-grounded explanation generation and refinement module.<n>Our method achieves state-of-the-art performance with 98.91% and 95.89% detection accuracy on regular and challenging samples respectively.
arXiv Detail & Related papers (2025-03-08T13:04:20Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Time Step Generating: A Universal Synthesized Deepfake Image Detector [0.4488895231267077]
We propose a universal synthetic image detector Time Step Generating (TSG)
TSG does not rely on pre-trained models' reconstructing ability, specific datasets, or sampling algorithms.
We test the proposed TSG on the large-scale GenImage benchmark and it achieves significant improvements in both accuracy and generalizability.
arXiv Detail & Related papers (2024-11-17T09:39:50Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Detecting Images Generated by Diffusers [12.986394431694206]
We consider images generated from captions in the MSCOCO and Wikimedia datasets using two state-of-the-art models: Stable Diffusion and GLIDE.
Our experiments show that it is possible to detect the generated images using simple Multi-Layer Perceptrons.
We find that incorporating the associated textual information with the images rarely leads to significant improvement in detection results.
arXiv Detail & Related papers (2023-03-09T14:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.