See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis
- URL: http://arxiv.org/abs/2602.20951v1
- Date: Tue, 24 Feb 2026 14:34:13 GMT
- Title: See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis
- Authors: Jaehyun Park, Minyoung Ahn, Minkyu Kim, Jonghyun Lee, Jae-Gil Lee, Dongmin Park,
- Abstract summary: ArtiAgent efficiently creates pairs of real and artifact-injected images.<n>It comprises three agents: a perception agent that recognizes entities and subentities from real images, a synthesis agent that introduces artifacts via artifact injection tools, and a curation agent that filters the synthesized artifacts.
- Score: 17.896266572037348
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite recent advances in diffusion models, AI generated images still often contain visual artifacts that compromise realism. Although more thorough pre-training and bigger models might reduce artifacts, there is no assurance that they can be completely eliminated, which makes artifact mitigation a highly crucial area of study. Previous artifact-aware methodologies depend on human-labeled artifact datasets, which are costly and difficult to scale, underscoring the need for an automated approach to reliably acquire artifact-annotated datasets. In this paper, we propose ArtiAgent, which efficiently creates pairs of real and artifact-injected images. It comprises three agents: a perception agent that recognizes and grounds entities and subentities from real images, a synthesis agent that introduces artifacts via artifact injection tools through novel patch-wise embedding manipulation within a diffusion transformer, and a curation agent that filters the synthesized artifacts and generates both local and global explanations for each instance. Using ArtiAgent, we synthesize 100K images with rich artifact annotations and demonstrate both efficacy and versatility across diverse applications. Code is available at link.
Related papers
- TranX-Adapter: Bridging Artifacts and Semantics within MLLMs for Robust AI-generated Image Detection [70.42796551833946]
incorporating texture-level artifact features alongside semantic features into multimodal large language models (MLLMs) can enhance their AIGI detection capability.<n>We propose a lightweight fusion adapter, TranX-Adapter, which integrates a Task-aware Optimal-Transport Fusion.<n>Experiments on standard AIGI detection benchmarks upon several advanced MLLMs, show that our TranX-Adapter brings consistent and significant improvements.
arXiv Detail & Related papers (2026-02-25T09:22:46Z) - Improving Artifact Robustness for CT Deep Learning Models Without Labeled Artifact Images via Domain Adaptation [2.7001982817730616]
This study evaluates domain adaptation as an approach for training models that maintain classification performance despite new artifacts.<n>We simulate ring artifacts from detector gain error in sinogram space and evaluate domain adversarial neural networks (DANN) against baseline and augmentation-based approaches on the OrganAMNIST abdominal CT dataset.<n>Our results demonstrate that baseline models trained only on clean images fail to generalize to images with ring artifacts, and traditional augmentation with other distortion types provides no improvement on unseen artifact domains.
arXiv Detail & Related papers (2025-10-08T02:27:09Z) - Synthesizing Artifact Dataset for Pixel-level Detection [16.31703475992344]
Artifact detectors enhance the performance of image-generative models by serving as reward models during fine-tuning.<n>We propose an artifact corruption pipeline that automatically injects artifacts into clean, high-quality synthetic images on a predetermined region.<n>The proposed method achieves performance improvements of 13.2% for ConvNeXt and 3.7% for Swin-T, as verified on human-labeled data.
arXiv Detail & Related papers (2025-09-23T21:28:33Z) - LEGION: Learning to Ground and Explain for Synthetic Image Detection [49.958951540410816]
We introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations.<n>It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels.<n>We propose LEGION, a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation.
arXiv Detail & Related papers (2025-03-19T14:37:21Z) - Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation [44.246370685732444]
We introduce FakeVLM, a specialized large multimodal model for both general synthetic image and DeepFake detection tasks.<n>FakeVLM excels in distinguishing real from fake images and provides clear, natural language explanations for image artifacts.<n>We also present FakeClue, a comprehensive dataset containing over 100,000 images across seven categories, annotated with fine-grained artifact clues in natural language.
arXiv Detail & Related papers (2025-03-19T05:14:44Z) - DiffDoctor: Diagnosing Image Diffusion Models Before Treating [57.82359018425674]
We propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts.<n>We collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process.<n>The learned artifact detector is then involved in the second stage to optimize the diffusion model by providing pixel-level feedback.
arXiv Detail & Related papers (2025-01-21T18:56:41Z) - Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors [62.63467652611788]
We introduce SEMI-TRUTHS, featuring 27,600 real images, 223,400 masks, and 1,472,700 AI-augmented images.
Each augmented image is accompanied by metadata for standardized and targeted evaluation of detector robustness.
Our findings suggest that state-of-the-art detectors exhibit varying sensitivities to the types and degrees of perturbations, data distributions, and augmentation methods used.
arXiv Detail & Related papers (2024-11-12T01:17:27Z) - SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model [15.616316848126642]
We develop a comprehensive artifact taxonomy and construct a dataset of synthetic images with artifact annotations for fine-tuning Vision-Language Model (VLM)
The fine-tuned VLM exhibits superior ability of identifying artifacts and outperforms the baseline by 25.66%.
arXiv Detail & Related papers (2024-02-28T05:54:02Z) - Rethinking the Up-Sampling Operations in CNN-based Generative Network
for Generalizable Deepfake Detection [86.97062579515833]
We introduce the concept of Neighboring Pixel Relationships(NPR) as a means to capture and characterize the generalized structural artifacts stemming from up-sampling operations.
A comprehensive analysis is conducted on an open-world dataset, comprising samples generated by tft28 distinct generative models.
This analysis culminates in the establishment of a novel state-of-the-art performance, showcasing a remarkable tft11.6% improvement over existing methods.
arXiv Detail & Related papers (2023-12-16T14:27:06Z) - Perceptual Artifacts Localization for Image Synthesis Tasks [59.638307505334076]
We introduce a novel dataset comprising 10,168 generated images, each annotated with per-pixel perceptual artifact labels.
A segmentation model, trained on our proposed dataset, effectively localizes artifacts across a range of tasks.
We propose an innovative zoom-in inpainting pipeline that seamlessly rectifies perceptual artifacts in the generated images.
arXiv Detail & Related papers (2023-10-09T10:22:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.