CAMME: Adaptive Deepfake Image Detection with Multi-Modal Cross-Attention
- URL: http://arxiv.org/abs/2505.18035v1
- Date: Fri, 23 May 2025 15:39:07 GMT
- Title: CAMME: Adaptive Deepfake Image Detection with Multi-Modal Cross-Attention
- Authors: Naseem Khan, Tuan Nguyen, Amine Bermak, Issa Khalil,
- Abstract summary: We propose CAMME, a framework that integrates visual, textual, and frequency-domain features through a multi-head cross-attention mechanism.<n>Experiments demonstrate CAMME's superiority over state-of-the-art methods, yielding improvements of 12.56% on natural scenes and 13.25% on facial deepfakes.
- Score: 4.359154048799454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of sophisticated AI-generated deepfakes poses critical challenges for digital media authentication and societal security. While existing detection methods perform well within specific generative domains, they exhibit significant performance degradation when applied to manipulations produced by unseen architectures--a fundamental limitation as generative technologies rapidly evolve. We propose CAMME (Cross-Attention Multi-Modal Embeddings), a framework that dynamically integrates visual, textual, and frequency-domain features through a multi-head cross-attention mechanism to establish robust cross-domain generalization. Extensive experiments demonstrate CAMME's superiority over state-of-the-art methods, yielding improvements of 12.56% on natural scenes and 13.25% on facial deepfakes. The framework demonstrates exceptional resilience, maintaining (over 91%) accuracy under natural image perturbations and achieving 89.01% and 96.14% accuracy against PGD and FGSM adversarial attacks, respectively. Our findings validate that integrating complementary modalities through cross-attention enables more effective decision boundary realignment for reliable deepfake detection across heterogeneous generative architectures.
Related papers
- CAST: Cross-Attentive Spatio-Temporal feature fusion for Deepfake detection [0.0]
CNNs are effective at capturing spatial artifacts, and Transformers excel at modeling temporal inconsistencies.<n>We propose a unified CAST model that leverages cross-attention to effectively fuse spatial and temporal features.<n>We evaluate the performance of our model using the FaceForensics++, Celeb-DF, and DeepfakeDetection datasets.
arXiv Detail & Related papers (2025-06-26T18:51:17Z) - Is Artificial Intelligence Generated Image Detection a Solved Problem? [10.839070838139401]
AIGIBench is a benchmark designed to rigorously evaluate the robustness and generalization capabilities of state-of-the-art AIGI detectors.<n>It includes 23 diverse fake image subsets that span both advanced and widely adopted image generation techniques.<n>Experiments on 11 advanced detectors demonstrate that, despite their high reported accuracy in controlled settings, these detectors suffer significant performance drops on real-world data.
arXiv Detail & Related papers (2025-05-18T10:00:39Z) - CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes [3.2194551406014886]
deepfake technology threatens the integrity of digital images by enabling subtle, context-aware manipulations.<n>We propose CapsFake, designed to detect such deepfake image edits by integrating low-level capsules from visual, textual, and frequency-domain modalities.<n>High-level capsules, predicted through a competitive routing mechanism, dynamically aggregate local features to identify manipulated regions with precision.
arXiv Detail & Related papers (2025-04-27T12:31:47Z) - Decoupled Doubly Contrastive Learning for Cross Domain Facial Action Unit Detection [66.80386429324196]
We propose a decoupled doubly contrastive adaptation (D$2$CA) approach to learn a purified AU representation.<n>D$2$CA is trained to disentangle AU and domain factors by assessing the quality of synthesized faces.<n>It consistently outperforms state-of-the-art cross-domain AU detection approaches.
arXiv Detail & Related papers (2025-03-12T00:42:17Z) - Hybrid Deepfake Image Detection: A Comprehensive Dataset-Driven Approach Integrating Convolutional and Attention Mechanisms with Frequency Domain Features [0.6700983301090583]
We propose an ensemble-based approach that employs three different neural network architectures for deepfake detection.<n>We empirically demonstrate the effectiveness of these models in grouping real and fake images into cohesive clusters.<n>Our weighted ensemble model achieves an excellent accuracy of 93.23% on the validation dataset of the SP Cup 2025 competition.
arXiv Detail & Related papers (2025-02-15T06:02:11Z) - HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection [4.908389661988192]
HFMF is a comprehensive two-stage deepfake detection framework.<n>It integrates vision Transformers and convolutional nets through a hierarchical feature fusion mechanism.<n>We demonstrate that our architecture achieves superior performance across diverse dataset benchmarks.
arXiv Detail & Related papers (2025-01-10T00:20:29Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples.
It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z) - DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention [12.36906630199689]
We construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model.
Our goal is to utilize a hierarchical progressive network to capture forged artifacts at different scales for detection and localization.
arXiv Detail & Related papers (2024-06-03T16:13:33Z) - CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF)
Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts.
It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z) - PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant
Semantic Segmentation [50.556961575275345]
We propose a perception-aware fusion framework to promote segmentation robustness in adversarial scenes.
We show that our scheme substantially enhances the robustness, with gains of 15.3% mIOU, compared with advanced competitors.
arXiv Detail & Related papers (2023-08-08T01:55:44Z) - MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential
Deepfake Detection [81.59191603867586]
Sequential deepfake detection aims to identify forged facial regions with the correct sequence for recovery.
The recovery of forged images requires knowledge of the manipulation model to implement inverse transformations.
We propose Multi-Collaboration and Multi-Supervision Network (MMNet) that handles various spatial scales and sequential permutations in forged face images.
arXiv Detail & Related papers (2023-07-06T02:32:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.