Related papers: SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting

SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting

URL: http://arxiv.org/abs/2502.06593v2
Date: Thu, 22 May 2025 18:13:28 GMT
Title: SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting
Authors: Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos, Panagiotis C. Petrantonakis,
Abstract summary: SAGI-D is the largest and most diverse dataset of AI-generated inpaintings.<n>Our experiments show that semantic alignment significantly improves image quality and aesthetics.<n>Using SAGI-D for training several image forensic approaches increases in-domain detection performance on average by 37.4%.
Score: 11.216906046169683
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in generative AI have made text-guided image inpainting -- adding, removing, or altering image regions using textual prompts -- widely accessible. However, generating semantically correct photorealistic imagery, typically requires carefully-crafted prompts and iterative refinement by evaluating the realism of the generated content - tasks commonly performed by humans. To automate the generative process, we propose Semantically Aligned and Uncertainty Guided AI Image Inpainting (SAGI), a model-agnostic pipeline, to sample prompts from a distribution that closely aligns with human perception and to evaluate the generated content and discard one that deviates from such a distribution, which we approximate using pretrained Large Language Models and Vision-Language Models. By applying this pipeline on multiple state-of-the-art inpainting models, we create the SAGI Dataset (SAGI-D), currently the largest and most diverse dataset of AI-generated inpaintings, comprising over 95k inpainted images and a human-evaluated subset. Our experiments show that semantic alignment significantly improves image quality and aesthetics, while uncertainty guidance effectively identifies realistic manipulations - human ability to distinguish inpainted images from real ones drops from 74% to 35% in terms of accuracy, after applying our pipeline. Moreover, using SAGI-D for training several image forensic approaches increases in-domain detection performance on average by 37.4% and out-of-domain generalization by 26.1% in terms of IoU, also demonstrating its utility in countering malicious exploitation of generative AI. Code and dataset are available at https://github.com/mever-team/SAGI

Related papers

VERITAS: Verification and Explanation of Realness in Images for Transparency in AI Systems [0.0]
We present VERITAS, a comprehensive framework that accurately detects whether a small (32x32) image is AI-generated.<n>VERITAS produces human-readable explanations that describe key artifacts in synthetic images.
arXiv Detail & Related papers (2025-07-07T15:57:05Z)
COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization [32.26473230517668]
COCOInpaint is a benchmark specifically designed for inpainting detection. High-quality inpainting samples generated by six state-of-the-art inpainting models. Large-scale coverage with 258,266 inpainted images with rich semantic diversity.
arXiv Detail & Related papers (2025-04-25T14:04:36Z)
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI [58.35348718345307]
Current efforts to distinguish between real and AI-generated images may lack generalization.<n>We propose a novel framework, Co-Spy, that first enhances existing semantic features.<n>We also create Co-Spy-Bench, a comprehensive dataset comprising 5 real image datasets and 22 state-of-the-art generative models.
arXiv Detail & Related papers (2025-03-24T01:59:29Z)
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents [62.616106562146776]
We propose a textbfVisual-Centric textbfSelection approach via textbfAgents Collaboration (ViSA) Our approach consists of 1) an image information quantification method via visual agents collaboration to select images with rich visual information, and 2) a visual-centric instruction quality assessment method to select high-quality instruction data related to high-quality images.
arXiv Detail & Related papers (2025-02-27T09:37:30Z)
DejAIvu: Identifying and Explaining AI Art on the Web in Real-Time with Saliency Maps [0.0]
We introduce DejAIvu, a Chrome Web extension that combines real-time AI-generated image detection with saliency-based explainability.<n>Our approach integrates efficient in-browser inference, gradient-based saliency analysis, and a seamless user experience, ensuring that AI detection is both transparent and interpretable.
arXiv Detail & Related papers (2025-02-12T22:24:49Z)
D-Judge: How Far Are We? Evaluating the Discrepancies Between AI-synthesized Images and Natural Images through Multimodal Guidance [19.760989919485894]
Despite advanced AI generative models producing visually compelling content, significant discrepancies remain when compared to natural images.<n>We construct a large-scale multimodal dataset named DANI, comprising 5,000 natural images and over 440,000 AI-generated image (AIGI) samples.<n>We then introduce D-Judge, a benchmark designed to answer the critical question: how far are AI-generated images from truly realistic images?
arXiv Detail & Related papers (2024-12-23T15:08:08Z)
Zero-Shot Detection of AI-Generated Images [54.01282123570917]
We propose a zero-shot entropy-based detector (ZED) to detect AI-generated images. Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images. ZED achieves an average improvement of more than 3% over the SoTA in terms of accuracy.
arXiv Detail & Related papers (2024-09-24T08:46:13Z)
A Sanity Check for AI-generated Image Detection [49.08585395873425]
We propose AIDE (AI-generated Image DEtector with Hybrid Features) to detect AI-generated images.<n>AIDE achieves +3.5% and +4.6% improvements to state-of-the-art methods.
arXiv Detail & Related papers (2024-06-27T17:59:49Z)
RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection. RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z)
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z)
Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model. With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z)
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images. Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z)
Detecting Generated Images by Real Images Only [64.12501227493765]
Existing generated image detection methods detect visual artifacts in generated images or learn discriminative features from both real and generated images by massive training. This paper approaches the generated image detection problem from a new perspective: Start from real images. By finding the commonality of real images and mapping them to a dense subspace in feature space, the goal is that generated images, regardless of their generative model, are then projected outside the subspace.
arXiv Detail & Related papers (2023-11-02T03:09:37Z)
On quantifying and improving realism of images generated with diffusion [50.37578424163951]
We propose a metric, called Image Realism Score (IRS), computed from five statistical measures of a given image. IRS is easily usable as a measure to classify a given image as real or fake. We experimentally establish the model- and data-agnostic nature of the proposed IRS by successfully detecting fake images generated by Stable Diffusion Model (SDM), Dalle2, Midjourney and BigGAN. Our efforts have also led to Gen-100 dataset, which provides 1,000 samples for 100 classes generated by four high-quality models.
arXiv Detail & Related papers (2023-09-26T08:32:55Z)
Randomize to Generalize: Domain Randomization for Runway FOD Detection [1.4249472316161877]
Tiny Object Detection is challenging due to small size, low resolution, occlusion, background clutter, lighting conditions and small object-to-image ratio. We propose a novel two-stage methodology Synthetic Image Augmentation (SRIA) to enhance generalization capabilities of models encountering 2D datasets. We report that detection accuracy improved from an initial 41% to 92% for OOD test set.
arXiv Detail & Related papers (2023-09-23T05:02:31Z)
Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects. In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL) A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z)
Level Up the Deepfake Detection: a Method to Effectively Discriminate Images Generated by GAN Architectures and Diffusion Models [0.0]
The deepfake detection and recognition task was investigated by collecting a dedicated dataset of pristine images and fake ones. A hierarchical multi-level approach was introduced to solve three different deepfake detection and recognition tasks. Experimental results demonstrated, in each case, more than 97% classification accuracy, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2023-03-01T16:01:46Z)
A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture. The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses. We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z)
Holistic Image Manipulation Detection using Pixel Co-occurrence Matrices [16.224649756613655]
Digital image forensics aims to detect images that have been digitally manipulated. Most detection methods in literature focus on detecting a particular type of manipulation. We propose a novel approach to holistically detect tampered images using a combination of pixel co-occurrence matrices and deep learning.
arXiv Detail & Related papers (2021-04-12T17:54:42Z)
Unifying Remote Sensing Image Retrieval and Classification with Robust Fine-tuning [3.6526118822907594]
We aim at unifying remote sensing image retrieval and classification with a new large-scale training and testing dataset, SF300. We show that our framework systematically achieves a boost of retrieval and classification performance on nine different datasets compared to an ImageNet pretrained baseline.
arXiv Detail & Related papers (2021-02-26T11:01:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.