Related papers: GAMMA: Generalizable Alignment via Multi-task and Manipulation-Augmented Training for AI-Generated Image Detection

GAMMA: Generalizable Alignment via Multi-task and Manipulation-Augmented Training for AI-Generated Image Detection

URL: http://arxiv.org/abs/2509.10250v1
Date: Fri, 12 Sep 2025 13:46:54 GMT
Title: GAMMA: Generalizable Alignment via Multi-task and Manipulation-Augmented Training for AI-Generated Image Detection
Authors: Haozhen Yan, Yan Hong, Suning Lang, Jiahui Zhan, Yikun Ji, Yujie Gao, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang,
Abstract summary: We propose GAMMA, a novel training framework designed to reduce domain bias and enhance semantic alignment.<n>We employ multi-task supervision with dual segmentation heads and a classification head, enabling pixel-level source attribution across diverse generative domains.<n>Our method achieves state-of-the-art generalization performance on the GenImage benchmark, imporving accuracy by 5.8%, but also maintains strong robustness on newly released generative model such as GPT-4o.
Score: 26.484706270778318
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With generative models becoming increasingly sophisticated and diverse, detecting AI-generated images has become increasingly challenging. While existing AI-genereted Image detectors achieve promising performance on in-distribution generated images, their generalization to unseen generative models remains limited. This limitation is largely attributed to their reliance on generation-specific artifacts, such as stylistic priors and compression patterns. To address these limitations, we propose GAMMA, a novel training framework designed to reduce domain bias and enhance semantic alignment. GAMMA introduces diverse manipulation strategies, such as inpainting-based manipulation and semantics-preserving perturbations, to ensure consistency between manipulated and authentic content. We employ multi-task supervision with dual segmentation heads and a classification head, enabling pixel-level source attribution across diverse generative domains. In addition, a reverse cross-attention mechanism is introduced to allow the segmentation heads to guide and correct biased representations in the classification branch. Our method achieves state-of-the-art generalization performance on the GenImage benchmark, imporving accuracy by 5.8%, but also maintains strong robustness on newly released generative model such as GPT-4o.

Related papers

Detecting AI-Generated Images via Distributional Deviations from Real Images [6.615773227400183]
We propose a Masking-based Pre-trained model Fine-Tuning (MPFT) strategy, which introduces a Texture-Aware Masking (TAM) mechanism to mask textured areas containing generative model-specific patterns during fine-tuning.<n>Our method, fine-tuned with only a minimal number of images, significantly outperforms existing approaches, achieving up to 98.2% and 94.6% average accuracy on the two datasets, respectively.
arXiv Detail & Related papers (2026-01-07T05:00:13Z)
Scaling Up AI-Generated Image Detection via Generator-Aware Prototypes [15.99138549265524]
Generator-Aware Prototype Learning (GAPL) is a framework that constrains representation with a structured learning paradigm.<n>GAPL achieves state-of-the-art performance, showing superior detection accuracy across a wide variety of GAN and diffusion-based generators.
arXiv Detail & Related papers (2025-12-15T04:58:08Z)
Task-Model Alignment: A Simple Path to Generalizable AI-Generated Image Detection [57.17054616831796]
Vision Language Models (VLMs) are increasingly adopted for AI-generated images (AIGI) detection.<n>VLMs' underperformance is attributed to task-model misalignment.<n>In this paper, we formalize AIGI detection as two complementary tasks--semantic consistency checking and pixel-artifact detection.
arXiv Detail & Related papers (2025-12-07T09:19:00Z)
Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective [80.10217707456046]
We introduce a self-supervised approach for detecting AI-generated images that leverages camera metadata.<n>We train a feature extractor solely on camera-captured photographs by classifying categorical EXIF tags.<n>Our detectors deliver strong generalization to in-the-wild samples and robustness to common benign image perturbations.
arXiv Detail & Related papers (2025-12-05T11:53:18Z)
MiraGe: Multimodal Discriminative Representation Learning for Generalizable AI-Generated Image Detection [32.662682253295486]
We propose Multimodal Discriminative Learning for Generalizable AI-generated Image Detection (MiraGegenerator)<n>We apply multimodal prompt learning to further refine these principles into CLIP, leveraging text embeddings as semantic anchors for effective discriminative representation learning.<n>MiraGegenerator achieves state-of-the-art performance, maintaining robustness even against unseen generators like Sora.
arXiv Detail & Related papers (2025-08-03T00:19:18Z)
NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection [14.7077339945096]
NS-Net is a novel framework that decouples semantic information from CLIP's visual features, followed by contrastive learning to capture intrinsic distributional differences between real and generated images.<n>Experiments show that NS-Net outperforms existing state-of-the-art methods, achieving a 7.4% improvement in detection accuracy.
arXiv Detail & Related papers (2025-08-02T07:58:15Z)
Bi-Level Optimization for Self-Supervised AI-Generated Face Detection [56.57881725223548]
We introduce a self-supervised method for AI-generated face detectors based on bi-level optimization.<n>Our detectors significantly outperform existing approaches in both one-class and binary classification settings.
arXiv Detail & Related papers (2025-07-30T16:38:29Z)
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation [53.01486796503091]
We present emphHarmon, a unified autoregressive framework that harmonizes understanding and generation tasks with a shared MAR encoder.<n>Harmon achieves state-of-the-art image generation results on the GenEval, MJHQ30K and WISE benchmarks.
arXiv Detail & Related papers (2025-03-27T20:50:38Z)
GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing [60.101097709212716]
This paper introduces GenMix, a generalizable prompt-guided generative data augmentation approach.<n>Our technique leverages image editing to generate augmented images based on custom conditional prompts.<n>Our approach mitigates unrealistic images and label ambiguity, improving the performance and adversarial robustness of the resulting models.
arXiv Detail & Related papers (2024-12-03T10:45:34Z)
Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection [58.87142367781417]
A naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked.<n>One potential remedy is incorporating the pre-trained knowledge within the vision foundation models to expand the feature space.<n>By freezing the principal components and adapting only the remained components, we preserve the pre-trained knowledge while learning fake patterns.
arXiv Detail & Related papers (2024-11-23T19:10:32Z)
Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model. With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z)
Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators [12.053125079460234]
We show how modern T2I generators can be used to simulate arbitrary interventions over such environmental factors. Our empirical findings demonstrate that modern T2I generators like Stable Diffusion can indeed be used as a powerful interventional data augmentation mechanism.
arXiv Detail & Related papers (2022-12-21T18:07:39Z)
Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation. It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression. Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z)
A Method for Evaluating Deep Generative Models of Images via Assessing the Reproduction of High-order Spatial Context [9.00018232117916]
Generative adversarial networks (GANs) are one kind of DGM which are widely employed. In this work, we demonstrate several objective tests of images output by two popular GAN architectures. We designed several context models (SCMs) of distinct image features that can be recovered after generation by a trained GAN.
arXiv Detail & Related papers (2021-11-24T15:58:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.