All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning
- URL: http://arxiv.org/abs/2504.01396v1
- Date: Wed, 02 Apr 2025 06:32:09 GMT
- Title: All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning
- Authors: Zheng Yang, Ruoxin Chen, Zhiyuan Yan, Ke-Yue Zhang, Xinghe Fu, Shuang Wu, Xiujun Shu, Taiping Yao, Junchi Yan, Shouhong Ding, Xi Li,
- Abstract summary: We establish two key principles for AIGI detection through systematic analysis.<n>textbf(1) All Patches Matter: Unlike conventional image classification where discriminative features concentrate on object-centric regions, each patch in AIGIs inherently contains synthetic artifacts due to the uniform generation process.<n>textbf (2) More Patches Better: Leveraging distributed artifacts across more patches improves detection by capturing complementary forensic evidence.<n>textbfPanoptic textbfPatch textbfLearning (PPL) framework.
- Score: 76.79222779026634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The exponential growth of AI-generated images (AIGIs) underscores the urgent need for robust and generalizable detection methods. In this paper, we establish two key principles for AIGI detection through systematic analysis: \textbf{(1) All Patches Matter:} Unlike conventional image classification where discriminative features concentrate on object-centric regions, each patch in AIGIs inherently contains synthetic artifacts due to the uniform generation process, suggesting that every patch serves as an important artifact source for detection. \textbf{(2) More Patches Better}: Leveraging distributed artifacts across more patches improves detection robustness by capturing complementary forensic evidence and reducing over-reliance on specific patches, thereby enhancing robustness and generalization. However, our counterfactual analysis reveals an undesirable phenomenon: naively trained detectors often exhibit a \textbf{Few-Patch Bias}, discriminating between real and synthetic images based on minority patches. We identify \textbf{Lazy Learner} as the root cause: detectors preferentially learn conspicuous artifacts in limited patches while neglecting broader artifact distributions. To address this bias, we propose the \textbf{P}anoptic \textbf{P}atch \textbf{L}earning (PPL) framework, involving: (1) Random Patch Replacement that randomly substitutes synthetic patches with real counterparts to compel models to identify artifacts in underutilized regions, encouraging the broader use of more patches; (2) Patch-wise Contrastive Learning that enforces consistent discriminative capability across all patches, ensuring uniform utilization of all patches. Extensive experiments across two different settings on several benchmarks verify the effectiveness of our approach.
Related papers
- Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections [50.343419243749054]
Anomaly Detection (AD) involves identifying deviations from normal data distributions.
We propose a novel approach that conditions the prompts of the text encoder based on image context extracted from the vision encoder.
Our method achieves state-of-the-art performance, improving performance by 2% to 29% across different metrics on 14 datasets.
arXiv Detail & Related papers (2025-04-15T10:42:25Z) - SoftPatch+: Fully Unsupervised Anomaly Classification and Segmentation [84.07909405887696]
This paper is the first to consider fully unsupervised industrial anomaly detection (i.e., unsupervised AD with noisy data)<n>We propose memory-based unsupervised AD methods, SoftPatch and SoftPatch+, which efficiently denoise the data at the patch level.<n>Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset.<n> Comprehensive experiments conducted in diverse noise scenarios demonstrate that both SoftPatch and SoftPatch+ outperform the state-of-the-art AD methods on the MVTecAD, ViSA, and BTAD benchmarks.
arXiv Detail & Related papers (2024-12-30T11:16:49Z) - Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection [58.87142367781417]
A naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked.<n>One potential remedy is incorporating the pre-trained knowledge within the vision foundation models to expand the feature space.<n>By freezing the principal components and adapting only the remained components, we preserve the pre-trained knowledge while learning forgery-related patterns.
arXiv Detail & Related papers (2024-11-23T19:10:32Z) - Semi-supervised 3D Object Detection with PatchTeacher and PillarMix [71.4908268136439]
Current semi-supervised 3D object detection methods typically use a teacher to generate pseudo labels for a student.
We propose PatchTeacher, which focuses on partial scene 3D object detection to provide high-quality pseudo labels for the student.
We introduce three key techniques, i.e., Patch Normalizer, Quadrant Align, and Fovea Selection, to improve the performance of PatchTeacher.
arXiv Detail & Related papers (2024-07-13T06:58:49Z) - PAD: Patch-Agnostic Defense against Adversarial Patch Attacks [36.865204327754626]
Adversarial patch attacks present a significant threat to real-world object detectors.
We show two inherent characteristics of adversarial patches, semantic independence and spatial heterogeneity.
We propose PAD, a novel adversarial patch localization and removal method that does not require prior knowledge or additional training.
arXiv Detail & Related papers (2024-04-25T09:32:34Z) - UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision
Transformer for Face Forgery Detection [52.91782218300844]
We propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT.
Due to the self-attention mechanism, the attention map among patch embeddings naturally represents the consistency relation, making the vision Transformer suitable for the consistency representation learning.
arXiv Detail & Related papers (2022-10-23T15:24:47Z) - Feasibility of Inconspicuous GAN-generated Adversarial Patches against
Object Detection [3.395452700023097]
In this work, we have evaluated the existing approaches to generate inconspicuous patches.
We have evaluated two approaches to generate naturalistic patches: by incorporating patch generation into the GAN training process and by using the pretrained GAN.
Our experiments have shown, that using a pre-trained GAN helps to gain realistic-looking patches while preserving the performance similar to conventional adversarial patches.
arXiv Detail & Related papers (2022-07-15T08:48:40Z) - PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch
Recognition [13.840830140721462]
Face anti-spoofing (FAS) plays a critical role in securing face recognition systems from presentation attacks.
We propose PatchNet which reformulates face anti-spoofing as a fine-grained patch-type recognition problem.
arXiv Detail & Related papers (2022-03-27T15:16:17Z) - PatchCensor: Patch Robustness Certification for Transformers via
Exhaustive Testing [7.88628640954152]
Vision Transformer (ViT) is known to be highly nonlinear like other classical neural networks and could be easily fooled by both natural and adversarial patch perturbations.
This limitation could pose a threat to the deployment of ViT in the real industrial environment, especially in safety-critical scenarios.
We propose PatchCensor, aiming to certify the patch robustness of ViT by applying exhaustive testing.
arXiv Detail & Related papers (2021-11-19T23:45:23Z) - Generating Adversarial yet Inconspicuous Patches with a Single Image [15.217367754000913]
We propose an approach to gen-erate adversarial yet inconspicuous patches with onesingle image.
In our approach, adversarial patches areproduced in a coarse-to-fine way with multiple scalesof generators and discriminators.
Our ap-proach shows strong attacking ability in both the white-box and black-box setting.
arXiv Detail & Related papers (2020-09-21T11:56:01Z) - Rethinking Generative Zero-Shot Learning: An Ensemble Learning
Perspective for Recognising Visual Patches [52.67723703088284]
We propose a novel framework called multi-patch generative adversarial nets (MPGAN)
MPGAN synthesises local patch features and labels unseen classes with a novel weighted voting strategy.
MPGAN has significantly greater accuracy than state-of-the-art methods.
arXiv Detail & Related papers (2020-07-27T05:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.