SAIDO: Generalizable Detection of AI-Generated Images via Scene-Aware and Importance-Guided Dynamic Optimization in Continual Learning
- URL: http://arxiv.org/abs/2512.00539v1
- Date: Sat, 29 Nov 2025 16:13:50 GMT
- Title: SAIDO: Generalizable Detection of AI-Generated Images via Scene-Aware and Importance-Guided Dynamic Optimization in Continual Learning
- Authors: Yongkang Hu, Yu Cheng, Yushuo Zhang, Yuan Xie, Zhaoxia Yin,
- Abstract summary: We propose a Scene-Aware and Importance-Guided Dynamic Optimization detection framework with continual learning (SAIDO)<n>For each scene, independent expert modules are dynamically allocated, enabling the framework to capture scene-specific forgery features better.<n>Our method outperforms the current SOTA method in both stability and plasticity, achieving 44.22% and 40.57% relative reductions in average detection error rate and forgetting rate, respectively.
- Score: 20.771107947846122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The widespread misuse of image generation technologies has raised security concerns, driving the development of AI-generated image detection methods. However, generalization has become a key challenge and open problem: existing approaches struggle to adapt to emerging generative methods and content types in real-world scenarios. To address this issue, we propose a Scene-Aware and Importance-Guided Dynamic Optimization detection framework with continual learning (SAIDO). Specifically, we design Scene-Awareness-Based Expert Module (SAEM) that dynamically identifies and incorporates new scenes using VLLMs. For each scene, independent expert modules are dynamically allocated, enabling the framework to capture scene-specific forgery features better and enhance cross-scene generalization. To mitigate catastrophic forgetting when learning from multiple image generative methods, we introduce Importance-Guided Dynamic Optimization Mechanism (IDOM), which optimizes each neuron through an importance-guided gradient projection strategy, thereby achieving an effective balance between model plasticity and stability. Extensive experiments on continual learning tasks demonstrate that our method outperforms the current SOTA method in both stability and plasticity, achieving 44.22\% and 40.57\% relative reductions in average detection error rate and forgetting rate, respectively. On open-world datasets, it improves the average detection accuracy by 9.47\% compared to the current SOTA method.
Related papers
- OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL [63.388513841293616]
Existing forgery detection methods fail to handle the interleaved text, images, and videos prevalent in real-world misinformation.<n>To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.<n>We propose textbf OmniVL-Guard, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding.
arXiv Detail & Related papers (2026-02-11T09:41:36Z) - AEGPO: Adaptive Entropy-Guided Policy Optimization for Diffusion Models [54.56296715999545]
Reinforcement learning from human feedback shows promise for aligning diffusion and flow models.<n>Policy optimization methods such as GRPO suffer from inefficient and static sampling strategies.<n>We propose Adaptive Entropy-Guided Policy Optimization (AEGPO), a novel dual-signal, dual-level adaptive optimization strategy.
arXiv Detail & Related papers (2026-02-06T16:09:50Z) - Generalizable and Adaptive Continual Learning Framework for AI-generated Image Detection [42.71754298609258]
malicious misuse and widespread dissemination of AI-generated images pose a significant threat to the authenticity of online information.<n>Current detection methods often struggle to generalize to unseen generative models.<n>We propose a novel three-stage domain continual learning framework designed for continuous adaptation to evolving generative models.
arXiv Detail & Related papers (2026-01-09T07:01:22Z) - Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z) - RoHOI: Robustness Benchmark for Human-Object Interaction Detection [84.78366452133514]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z) - Make It Efficient: Dynamic Sparse Attention for Autoregressive Image Generation [8.624395048491275]
We propose a novel training-free context optimization method called Adaptive Dynamic Sparse Attention (ADSA)<n>ADSA identifies historical tokens crucial for maintaining local texture consistency and those essential for ensuring global semantic coherence, thereby efficiently streamlining attention.<n>We also introduce a dynamic KV-cache update mechanism tailored for ADSA, reducing GPU memory consumption during inference by approximately $50%$.
arXiv Detail & Related papers (2025-06-23T01:27:06Z) - Two-Stage Random Alternation Framework for One-Shot Pansharpening [12.385955231193675]
We introduce a two-stage random alternating framework (TRA-PAN) that performs instance-specific optimization for any given Multispectral(MS)/Panchromatic(PAN) pair.<n>TRA-PAN effectively integrates strong supervision constraints from reduced-resolution images with the physical characteristics of the full-resolution images.<n> Experimental results demonstrate that TRA-PAN outperforms state-of-the-art (SOTA) methods in quantitative metrics and visual quality in real-world scenarios.
arXiv Detail & Related papers (2025-05-10T09:26:22Z) - MLEP: Multi-granularity Local Entropy Patterns for Universal AI-generated Image Detection [44.40575446607237]
There is an urgent need for effective methods to detect AI-generated images (AIGI)<n>We propose Multi-granularity Local Entropy Patterns (MLEP), a set of entropy feature maps computed across shuffled small patches over multiple image scaled.<n>MLEP comprehensively captures pixel relationships across dimensions and scales while significantly disrupting image semantics, reducing potential content bias.
arXiv Detail & Related papers (2025-04-18T14:50:23Z) - Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities.<n>We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details.<n>We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z) - Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training.
We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields.
Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.