Related papers: Can We Build a Monolithic Model for Fake Image Detection? SICA: Semantic-Induced Constrained Adaptation for Unified-Yet-Discriminative Artifact Feature Space Reconstruction

Can We Build a Monolithic Model for Fake Image Detection? SICA: Semantic-Induced Constrained Adaptation for Unified-Yet-Discriminative Artifact Feature Space Reconstruction

URL: http://arxiv.org/abs/2602.06676v1
Date: Fri, 06 Feb 2026 13:03:26 GMT
Title: Can We Build a Monolithic Model for Fake Image Detection? SICA: Semantic-Induced Constrained Adaptation for Unified-Yet-Discriminative Artifact Feature Space Reconstruction
Authors: Bo Du, Xiaochen Ma, Xuekang Zhu, Zhe Yang, Chaogun Niu, Jian Liu, Ji-Zhe Zhou,
Abstract summary: monolithic Fake Image Detection (FID) models consistently yield inferior performance in practice.<n>We propose Semantic-Induced Constrained Adaptation (SICA), the first monolithic FID paradigm.<n>SICA outperforms 15 state-of-the-art methods and reconstructs the target unified-yet-discriminative artifact feature space.
Score: 35.062003602486925
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fake Image Detection (FID), aiming at unified detection across four image forensic subdomains, is critical in real-world forensic scenarios. Compared with ensemble approaches, monolithic FID models are theoretically more promising, but to date, consistently yield inferior performance in practice. In this work, by discovering the ``heterogeneous phenomenon'', which is the intrinsic distinctness of artifacts across subdomains, we diagnose the cause of this underperformance for the first time: the collapse of the artifact feature space driven by such phenomenon. The core challenge for developing a practical monolithic FID model thus boils down to the ``unified-yet-discriminative" reconstruction of the artifact feature space. To address this paradoxical challenge, we hypothesize that high-level semantics can serve as a structural prior for the reconstruction, and further propose Semantic-Induced Constrained Adaptation (SICA), the first monolithic FID paradigm. Extensive experiments on our OpenMMSec dataset demonstrate that SICA outperforms 15 state-of-the-art methods and reconstructs the target unified-yet-discriminative artifact feature space in a near-orthogonal manner, thus firmly validating our hypothesis. The code and dataset are available at:https: //github.com/scu-zjz/SICA_OpenMMSec.

Related papers

Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models [15.709482146201283]
A simple linear classifier trained on the frozen features of modern Vision Foundation Models establishes a new state-of-the-art.<n>We show that this baseline matches specialized detectors on standard benchmarks but also decisively outperforms them on in-the-wild datasets.<n>We conclude by advocating for a paradigm shift in AI forensics, moving from overfitting on static benchmarks to harnessing the evolving world knowledge of foundation models for real-world reliability.
arXiv Detail & Related papers (2026-02-02T07:20:02Z)
DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation [47.409626500688866]
We present the DINO Spherical Autoencoder (DINO-SAE), a framework that bridges semantic representation and pixel-level reconstruction.<n>Our approach achieves state-of-the-art reconstruction quality, reaching 0.37 rFID and 26.2 dB PSNR, while maintaining strong semantic alignment to the pretrained VFM.
arXiv Detail & Related papers (2026-01-30T12:25:34Z)
DNA: Uncovering Universal Latent Forgery Knowledge [39.19379714306656]
forgery detection capability is already encoded within pre-trained models.<n>DNA framework employs a coarse-to-fine excavation mechanism.<n>Hifi-Gen is a high-fidelity synthetic benchmark built upon the very latest models.
arXiv Detail & Related papers (2026-01-30T03:48:30Z)
Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation [17.405818788700234]
We present a Collaborative Multi-Agent Reasoning Framework that explicitly decouples Semantic Planning from Visual Synthesis.<n>Our method generates a structured, explicit plan before pixel generation, enabling visually and semantically coherent single-pass synthesis.<n>Addressing the limitations of traditional metrics in assessing inferred invisible content, we introduce the MAC-Score, a novel human-aligned evaluation metric.
arXiv Detail & Related papers (2025-12-24T04:39:45Z)
Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition [51.03674130115878]
We introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture.<n>KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-10-23T07:12:26Z)
OmniDFA: A Unified Framework for Open Set Synthesis Image Detection and Few-Shot Attribution [21.62979058692505]
OmniDFA is a novel framework for AIGI that assesses the authenticity of images, and determines the origins in a few-shot manner.<n>We construct OmniFake, a large class-aware synthetic image dataset that curates $1.17$ M images from $45$ distinct generative models.<n>Experiments demonstrate that OmniDFA exhibits excellent capability in open-set attribution and achieves state-of-the-art generalization performance on AIGI detection.
arXiv Detail & Related papers (2025-09-30T02:36:40Z)
Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection [58.87142367781417]
A naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked.<n>One potential remedy is incorporating the pre-trained knowledge within the vision foundation models to expand the feature space.<n>By freezing the principal components and adapting only the remained components, we preserve the pre-trained knowledge while learning fake patterns.
arXiv Detail & Related papers (2024-11-23T19:10:32Z)
UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image. In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting. Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z)
Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part. We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge. Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z)
Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model. A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations. We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.