Attention to Detail: Global-Local Attention for High-Resolution AI-Generated Image Detection
- URL: http://arxiv.org/abs/2601.00141v1
- Date: Thu, 01 Jan 2026 00:00:07 GMT
- Title: Attention to Detail: Global-Local Attention for High-Resolution AI-Generated Image Detection
- Authors: Lawrence Han,
- Abstract summary: GLASS is an architecture that combines a globally resized view with multiple randomly sampled local crops.<n>It can be integrated into vision models to leverage both global and local information in images of any size.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid development of generative AI has made AI-generated images increasingly realistic and high-resolution. Most AI-generated image detection architectures typically downsample images before inputting them into models, risking the loss of fine-grained details. This paper presents GLASS (Global-Local Attention with Stratified Sampling), an architecture that combines a globally resized view with multiple randomly sampled local crops. These crops are original-resolution regions efficiently selected through spatially stratified sampling and aggregated using attention-based scoring. GLASS can be integrated into vision models to leverage both global and local information in images of any size. Vision Transformer, ResNet, and ConvNeXt models are used as backbones, and experiments show that GLASS outperforms standard transfer learning by achieving higher predictive performance within feasible computational constraints.
Related papers
- Detecting AI-Generated Images via Distributional Deviations from Real Images [6.615773227400183]
We propose a Masking-based Pre-trained model Fine-Tuning (MPFT) strategy, which introduces a Texture-Aware Masking (TAM) mechanism to mask textured areas containing generative model-specific patterns during fine-tuning.<n>Our method, fine-tuned with only a minimal number of images, significantly outperforms existing approaches, achieving up to 98.2% and 94.6% average accuracy on the two datasets, respectively.
arXiv Detail & Related papers (2026-01-07T05:00:13Z) - NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection [14.7077339945096]
NS-Net is a novel framework that decouples semantic information from CLIP's visual features, followed by contrastive learning to capture intrinsic distributional differences between real and generated images.<n>Experiments show that NS-Net outperforms existing state-of-the-art methods, achieving a 7.4% improvement in detection accuracy.
arXiv Detail & Related papers (2025-08-02T07:58:15Z) - Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach [69.01456182499486]
textbfBR-Gen is a large-scale dataset of 150,000 locally forged images with diverse scene-aware annotations.<n>textbfNFA-ViT is a Noise-guided Forgery Amplification Vision Transformer that enhances the detection of localized forgeries.
arXiv Detail & Related papers (2025-04-16T09:57:23Z) - Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models [50.98559225639266]
Sub-images with higher semantic relevance to the entire image encapsulate richer visual information for preserving the model's visual understanding ability.<n>Global Semantic-guided Weight Allocator (GSWA) module allocates weights to sub-images based on their relative information density.<n>SleighVL, a lightweight yet high-performing model, outperforms models with comparable parameters and remains competitive with larger models.
arXiv Detail & Related papers (2025-01-24T06:42:06Z) - LDR-Net: A Novel Framework for AI-generated Image Detection via Localized Discrepancy Representation [30.677834580640123]
We propose the localized discrepancy representation network (LDR-Net) for detecting AI-generated images.<n>LDR-Net captures smoothing artifacts and texture irregularities, which are common but often overlooked.<n>It achieves state-of-the-art performance in detecting generated images and exhibits satisfactory generalization across unseen generative models.
arXiv Detail & Related papers (2025-01-23T08:46:39Z) - Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution [81.74583887661794]
We build a new real-world super-resolution benchmark with both integer and non-integer scaling factors.
We propose a Dual-level Deformable Implicit Representation (DDIR) to solve real-world scale arbitrary super-resolution.
Our trained model achieves state-of-the-art performance on the RealArbiSR and RealSR benchmarks for real-world scale arbitrary super-resolution.
arXiv Detail & Related papers (2024-03-16T13:44:42Z) - Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.<n>Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.<n>We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
Vision Transformer for Fast Arbitrary One-Shot Image Generation [11.207512995742999]
One-shot image generation (OSG) with generative adversarial networks that learn from the internal patches of a given image has attracted world wide attention.
We propose a novel structure-preserved method TcGAN with individual vision transformer to overcome the shortcomings of the existing one-shot image generation methods.
arXiv Detail & Related papers (2023-02-16T03:05:59Z) - GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection [29.118321046339656]
We propose a framework to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for AI synthesized image detection.
GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction.
arXiv Detail & Related papers (2022-11-16T02:03:20Z) - Any-resolution Training for High-resolution Image Synthesis [55.19874755679901]
Generative models operate at fixed resolution, even though natural images come in a variety of sizes.
We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions.
We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions.
arXiv Detail & Related papers (2022-04-14T17:59:31Z) - Fusing Global and Local Features for Generalized AI-Synthesized Image
Detection [31.35052580048599]
We design a two-branch model to combine global spatial information from the whole image and local informative features from patches selected by a novel patch selection module.
We collect a highly diverse dataset synthesized by 19 models with various objects and resolutions to evaluate our model.
arXiv Detail & Related papers (2022-03-26T01:55:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.