Scalable Evaluation of the Realism of Synthetic Environmental Augmentations in Images
- URL: http://arxiv.org/abs/2603.04325v1
- Date: Wed, 04 Mar 2026 17:46:08 GMT
- Title: Scalable Evaluation of the Realism of Synthetic Environmental Augmentations in Images
- Authors: Damian J. Ruck, Paul Vautravers, Oliver Chalkley, Jake Thomas,
- Abstract summary: We present a framework for assessing the realism of synthetic image-editing methods.<n>Using 40 clear-day images, we compare rule-based augmentation libraries with generative AI image-editing models.<n>Generative AI methods substantially outperform rule-based approaches, with the best generative method achieving approximately 3.6 times the acceptance rate of the best rule-based method.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Evaluation of AI systems often requires synthetic test cases, particularly for rare or safety-critical conditions that are difficult to observe in operational data. Generative AI offers a promising approach for producing such data through controllable image editing, but its usefulness depends on whether the resulting images are sufficiently realistic to support meaningful evaluation. We present a scalable framework for assessing the realism of synthetic image-editing methods and apply it to the task of adding environmental conditions-fog, rain, snow, and nighttime-to car-mounted camera images. Using 40 clear-day images, we compare rule-based augmentation libraries with generative AI image-editing models. Realism is evaluated using two complementary automated metrics: a vision-language model (VLM) jury for perceptual realism assessment, and embedding-based distributional analysis to measure similarity to genuine adverse-condition imagery. Generative AI methods substantially outperform rule-based approaches, with the best generative method achieving approximately 3.6 times the acceptance rate of the best rule-based method. Performance varies across conditions: fog proves easiest to simulate, while nighttime transformations remain challenging. Notably, the VLM jury assigns imperfect acceptance even to real adverse-condition imagery, establishing practical ceilings against which synthetic methods can be judged. By this standard, leading generative methods match or exceed real-image performance for most conditions. These results suggest that modern generative image-editing models can enable scalable generation of realistic adverse-condition imagery for evaluation pipelines. Our framework therefore provides a practical approach for scalable realism evaluation, though validation against human studies remains an important direction for future work.
Related papers
- Beyond Artifacts: Real-Centric Envelope Modeling for Reliable AI-Generated Image Detection [29.26836532055224]
Existing detectors often overfit to generator-specific artifacts and remain sensitive to real-world degradations.<n>We propose Real-centric Envelope Modeling (REM), a new paradigm that shifts detection from learning generator artifacts to modeling the robust distribution of real images.<n>REM achieves an average improvement of 7.5% over state-of-the-art methods, and notably maintains exceptional generalization on the severely degraded RealChain benchmark.
arXiv Detail & Related papers (2025-12-24T04:41:04Z) - Q-REAL: Towards Realism and Plausibility Evaluation for AI-Generated Content [71.46991494014382]
We introduce Q-Real, a novel dataset for fine-grained evaluation of realism and plausibility in AI-generated images.<n>Q-Real consists of 3,088 images generated by popular text-to-image models.<n>We construct Q-Real Bench to evaluate them on two tasks: judgment and grounding with reasoning.
arXiv Detail & Related papers (2025-11-21T02:43:17Z) - ThinkFake: Reasoning in Multimodal Large Language Models for AI-Generated Image Detection [51.93101033997245]
Increasing realism of AI-generated images has raised serious concerns about misinformation and privacy violations.<n>We propose ThinkFake, a novel reasoning-based and generalizable framework for AI-generated image detection.<n>We show that ThinkFake outperforms state-of-the-art methods on the GenImage benchmark and demonstrates strong zero-shot generalization on the challenging LOKI benchmark.
arXiv Detail & Related papers (2025-09-24T07:34:09Z) - Image Realness Assessment and Localization with Multimodal Features [3.1415249818332813]
A reliable method of quantifying the perceptual realness of AI-generated images is crucial for practical use and for improving photorealism of generative AI.<n>This paper introduces a framework that accomplishes both objective realness assessment and local inconsistency identification of AI-generated images.
arXiv Detail & Related papers (2025-09-16T17:42:51Z) - Edge-Enhanced Vision Transformer Framework for Accurate AI-Generated Image Detection [0.0]
We propose a hybrid detection framework that combines a fine-tuned Vision Transformer (ViT) with a novel edge-based image processing module.<n>The proposed method is highly suitable for real-world applications in automated content verification and digital forensics.
arXiv Detail & Related papers (2025-08-25T10:30:56Z) - Bridging Clear and Adverse Driving Conditions [0.0]
Domain Adaptation pipeline transforms clear-weather images into fog, rain, snow, and nighttime images.<n>We leverage an existing DA GAN, extend it to support auxiliary inputs, and develop a novel training recipe that leverages both simulated and real images.
arXiv Detail & Related papers (2025-08-19T07:58:05Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - X-Fake: Juggling Utility Evaluation and Explanation of Simulated SAR Images [49.546627070454456]
The distribution inconsistency between real and simulated data is the main obstacle that influences the utility of simulated SAR images.
We propose a novel trustworthy utility evaluation framework with a counterfactual explanation for simulated SAR images for the first time, denoted as X-Fake.
The proposed framework is validated on four simulated SAR image datasets obtained from electromagnetic models and generative artificial intelligence approaches.
arXiv Detail & Related papers (2024-07-28T09:27:53Z) - RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection.
RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z) - A comparison of different atmospheric turbulence simulation methods for
image restoration [64.24948495708337]
Atmospheric turbulence deteriorates the quality of images captured by long-range imaging systems.
Various deep learning-based atmospheric turbulence mitigation methods have been proposed in the literature.
We systematically evaluate the effectiveness of various turbulence simulation methods on image restoration.
arXiv Detail & Related papers (2022-04-19T16:21:36Z) - Uncertainty-aware Generalized Adaptive CycleGAN [44.34422859532988]
Unpaired image-to-image translation refers to learning inter-image-domain mapping in an unsupervised manner.
Existing methods often learn deterministic mappings without explicitly modelling the robustness to outliers or predictive uncertainty.
We propose a novel probabilistic method called Uncertainty-aware Generalized Adaptive Cycle Consistency (UGAC)
arXiv Detail & Related papers (2021-02-23T15:22:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.