OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild
- URL: http://arxiv.org/abs/2511.08423v1
- Date: Wed, 12 Nov 2025 01:58:18 GMT
- Title: OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild
- Authors: Yuncheng Guo, Junyan Ye, Chenjue Zhang, Hengrui Kang, Haohuan Fu, Conghui He, Weijia Li,
- Abstract summary: A universal AI-generated image detector must simultaneously generalize across diverse generative models and varied semantic content.<n>We propose OmniAID, a novel framework centered on a decoupled Mixture-of-Experts architecture.<n>Our model surpasses existing monolithic detectors, establishing a new, robust standard for AIGI authentication against modern, in-the-wild threats.
- Score: 40.585896860010656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A truly universal AI-Generated Image (AIGI) detector must simultaneously generalize across diverse generative models and varied semantic content. Current state-of-the-art methods learn a single, entangled forgery representation--conflating content-dependent flaws with content-agnostic artifacts--and are further constrained by outdated benchmarks. To overcome these limitations, we propose OmniAID, a novel framework centered on a decoupled Mixture-of-Experts (MoE) architecture. The core of our method is a hybrid expert system engineered to decouple: (1) semantic flaws across distinct content domains, and (2) these content-dependent flaws from content-agnostic universal artifacts. This system employs a set of Routable Specialized Semantic Experts, each for a distinct domain (e.g., human, animal), complemented by a Fixed Universal Artifact Expert. This architecture is trained using a bespoke two-stage strategy: we first train the experts independently with domain-specific hard-sampling to ensure specialization, and subsequently train a lightweight gating network for effective input routing. By explicitly decoupling "what is generated" (content-specific flaws) from "how it is generated" (universal artifacts), OmniAID achieves robust generalization. To address outdated benchmarks and validate real-world applicability, we introduce Mirage, a new large-scale, contemporary dataset. Extensive experiments, using both traditional benchmarks and our Mirage dataset, demonstrate our model surpasses existing monolithic detectors, establishing a new, robust standard for AIGI authentication against modern, in-the-wild threats.
Related papers
- Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models [15.709482146201283]
A simple linear classifier trained on the frozen features of modern Vision Foundation Models establishes a new state-of-the-art.<n>We show that this baseline matches specialized detectors on standard benchmarks but also decisively outperforms them on in-the-wild datasets.<n>We conclude by advocating for a paradigm shift in AI forensics, moving from overfitting on static benchmarks to harnessing the evolving world knowledge of foundation models for real-world reliability.
arXiv Detail & Related papers (2026-02-02T07:20:02Z) - One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection [65.11602552904456]
Universal visual anomaly detection (AD) aims to identify anomaly images and segment anomaly regions towards open and dynamic scenarios.<n>Current methods often struggle with complex prompt engineering, elaborate adaptation modules, and challenging training strategies.<n>This paper presents an embarrassingly simple, general, and effective framework for Universal vision Anomaly Detection (UniADet)
arXiv Detail & Related papers (2026-01-09T06:05:18Z) - AdaptPrompt: Parameter-Efficient Adaptation of VLMs for Generalizable Deepfake Detection [7.76090543025328]
Recent advances in image generation have led to the widespread availability of highly realistic synthetic media, increasing the difficulty of reliable deepfake detection.<n>A key challenge is generalization, as detectors trained on a narrow class of generators often fail when confronted with unseen models.<n>We address the pressing need for generalizable detection by leveraging large vision-language models, specifically CLIP, to identify synthetic content across diverse generative techniques.
arXiv Detail & Related papers (2025-12-19T16:06:03Z) - Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective [80.10217707456046]
We introduce a self-supervised approach for detecting AI-generated images that leverages camera metadata.<n>We train a feature extractor solely on camera-captured photographs by classifying categorical EXIF tags.<n>Our detectors deliver strong generalization to in-the-wild samples and robustness to common benign image perturbations.
arXiv Detail & Related papers (2025-12-05T11:53:18Z) - Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline [56.790045049514326]
Two major forms of deception dominate: human-crafted misinformation and AI-generated content.<n>We propose Unified Multimodal Fake Content Detection (UMFDet), a framework designed to handle both forms of deception.<n>UMFDet achieves robust and consistent performance across both misinformation types, outperforming specialized baselines.
arXiv Detail & Related papers (2025-09-30T09:26:32Z) - ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation [51.297873393639456]
ArtifactsBench is a framework for automated visual code generation evaluation.<n>Our framework renders each generated artifact and captures its dynamic behavior through temporal screenshots.<n>We construct a new benchmark of 1,825 diverse tasks and evaluate over 30 leading Large Language Models.
arXiv Detail & Related papers (2025-07-07T12:53:00Z) - MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection [30.470777079947958]
Video Anomaly Detection (VAD) methods based on reconstruction or prediction face two critical challenges.<n>Strong generalization capability often results in accurate reconstruction or prediction of abnormal events.<n>reliance only on low-level appearance and motion cues limits their ability to identify high-level semantic in abnormal events from complex scenes.
arXiv Detail & Related papers (2025-06-03T07:14:57Z) - HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection [4.908389661988192]
HFMF is a comprehensive two-stage deepfake detection framework.<n>It integrates vision Transformers and convolutional nets through a hierarchical feature fusion mechanism.<n>We demonstrate that our architecture achieves superior performance across diverse dataset benchmarks.
arXiv Detail & Related papers (2025-01-10T00:20:29Z) - Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection [58.87142367781417]
A naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked.<n>One potential remedy is incorporating the pre-trained knowledge within the vision foundation models to expand the feature space.<n>By freezing the principal components and adapting only the remained components, we preserve the pre-trained knowledge while learning fake patterns.
arXiv Detail & Related papers (2024-11-23T19:10:32Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.