Related papers: Exploring the Collaborative Advantage of Low-level Information on Generalizable AI-Generated Image Detection

Exploring the Collaborative Advantage of Low-level Information on Generalizable AI-Generated Image Detection

URL: http://arxiv.org/abs/2504.00463v1
Date: Tue, 01 Apr 2025 06:38:08 GMT
Title: Exploring the Collaborative Advantage of Low-level Information on Generalizable AI-Generated Image Detection
Authors: Ziyin Zhou, Ke Sun, Zhongxi Chen, Xianming Lin, Yunpeng Luo, Ke Yan, Shouhong Ding, Xiaoshuai Sun,
Abstract summary: Existing AI-generated image detection methods consider only a single type of low-level information.<n>Different low-level information often exhibits generalization capabilities for different types of forgeries.<n>We propose the Adaptive Low-level Experts Injection framework, enabling the backbone network to accept and learn knowledge from different low-level information.
Score: 46.5480496076675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing state-of-the-art AI-Generated image detection methods mostly consider extracting low-level information from RGB images to help improve the generalization of AI-Generated image detection, such as noise patterns. However, these methods often consider only a single type of low-level information, which may lead to suboptimal generalization. Through empirical analysis, we have discovered a key insight: different low-level information often exhibits generalization capabilities for different types of forgeries. Furthermore, we found that simple fusion strategies are insufficient to leverage the detection advantages of each low-level and high-level information for various forgery types. Therefore, we propose the Adaptive Low-level Experts Injection (ALEI) framework. Our approach introduces Lora Experts, enabling the backbone network, which is trained with high-level semantic RGB images, to accept and learn knowledge from different low-level information. We utilize a cross-attention method to adaptively fuse these features at intermediate layers. To prevent the backbone network from losing the modeling capabilities of different low-level features during the later stages of modeling, we developed a Low-level Information Adapter that interacts with the features extracted by the backbone network. Finally, we propose Dynamic Feature Selection, which dynamically selects the most suitable features for detecting the current image to maximize generalization detection capability. Extensive experiments demonstrate that our method, finetuned on only four categories of mainstream ProGAN data, performs excellently and achieves state-of-the-art results on multiple datasets containing unseen GAN and Diffusion methods.

Related papers

Rethinking the Use of Vision Transformers for AI-Generated Image Detection [30.35195934515703]
We introduce a novel adaptive method, termed MoLD, which dynamically integrates features from multiple ViT layers using a gating-based mechanism.<n>Experiments on both GAN- and diffusion-generated images demonstrate that MoLD significantly improves detection performance, enhances generalization across diverse generative models, and exhibits robustness in real-world scenarios.
arXiv Detail & Related papers (2025-12-04T16:37:47Z)
NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection [14.7077339945096]
NS-Net is a novel framework that decouples semantic information from CLIP's visual features, followed by contrastive learning to capture intrinsic distributional differences between real and generated images.<n>Experiments show that NS-Net outperforms existing state-of-the-art methods, achieving a 7.4% improvement in detection accuracy.
arXiv Detail & Related papers (2025-08-02T07:58:15Z)
Multimodal Conditional Information Bottleneck for Generalizable AI-Generated Image Detection [24.512663807403186]
InfoFD is a text-guided AI-generated image detection framework.<n>We introduce two key components: the Text-Guided Conditional Information Bottleneck (TGCIB) and Dynamic Text Orthogonalization (DTO)<n>Our model achieves exceptional generalization performance on the GenImage dataset and latest generative models.
arXiv Detail & Related papers (2025-05-21T07:46:26Z)
Hierarchical Information Flow for Generalized Efficient Image Restoration [108.83750852785582]
We propose a hierarchical information flow mechanism for image restoration, dubbed Hi-IR.<n>Hi-IR constructs a hierarchical information tree representing the degraded image across three levels.<n>In seven common image restoration tasks, Hi-IR achieves its effectiveness and generalizability.
arXiv Detail & Related papers (2024-11-27T18:30:08Z)
Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment [78.21609845377644]
We propose a novel class of state-of-the-art (SOTA) generative model, which exhibits the capability to model intricate relationships.<n>We devise a new diffusion restoration network that leverages the produced enhanced image and noise-containing images.<n>Two visual evaluation branches are designed to comprehensively analyze the obtained high-level feature information.
arXiv Detail & Related papers (2024-02-22T09:39:46Z)
Simple Image-level Classification Improves Open-vocabulary Object Detection [27.131298903486474]
Open-Vocabulary Object Detection (OVOD) aims to detect novel objects beyond a given set of base categories on which the detection model is trained. Recent OVOD methods focus on adapting the image-level pre-trained vision-language models (VLMs), such as CLIP, to a region-level object detection task via, eg., region-level knowledge distillation, regional prompt learning, or region-text pre-training. These methods have demonstrated remarkable performance in recognizing regional visual concepts, but they are weak in exploiting the VLMs' powerful global scene understanding ability learned from the billion-scale
arXiv Detail & Related papers (2023-12-16T13:06:15Z)
Fusing Global and Local Features for Generalized AI-Synthesized Image Detection [31.35052580048599]
We design a two-branch model to combine global spatial information from the whole image and local informative features from patches selected by a novel patch selection module. We collect a highly diverse dataset synthesized by 19 models with various objects and resolutions to evaluate our model.
arXiv Detail & Related papers (2022-03-26T01:55:37Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
Unifying Remote Sensing Image Retrieval and Classification with Robust Fine-tuning [3.6526118822907594]
We aim at unifying remote sensing image retrieval and classification with a new large-scale training and testing dataset, SF300. We show that our framework systematically achieves a boost of retrieval and classification performance on nine different datasets compared to an ImageNet pretrained baseline.
arXiv Detail & Related papers (2021-02-26T11:01:30Z)
Learning Deep Interleaved Networks with Asymmetric Co-Attention for Image Restoration [65.11022516031463]
We present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction. In this paper, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies. Our presented DIN can be trained end-to-end and applied to various image restoration tasks.
arXiv Detail & Related papers (2020-10-29T15:32:00Z)
Attention Model Enhanced Network for Classification of Breast Cancer Image [54.83246945407568]
AMEN is formulated in a multi-branch fashion with pixel-wised attention model and classification submodular. To focus more on subtle detail information, the sample image is enhanced by the pixel-wised attention map generated from former branch. Experiments conducted on three benchmark datasets demonstrate the superiority of the proposed method under various scenarios.
arXiv Detail & Related papers (2020-10-07T08:44:21Z)
Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features [4.25227087152716]
Convolutional networks learn similar low-level feature distributions when trained on any natural image dataset. When the discriminative features between inliers and outliers are on a high-level, anomaly detection becomes particularly challenging. We propose two methods to remove the negative impact of model bias and domain prior on detecting high-level differences.
arXiv Detail & Related papers (2020-06-18T20:56:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.