Hybrid Deepfake Image Detection: A Comprehensive Dataset-Driven Approach Integrating Convolutional and Attention Mechanisms with Frequency Domain Features
- URL: http://arxiv.org/abs/2502.10682v1
- Date: Sat, 15 Feb 2025 06:02:11 GMT
- Title: Hybrid Deepfake Image Detection: A Comprehensive Dataset-Driven Approach Integrating Convolutional and Attention Mechanisms with Frequency Domain Features
- Authors: Kafi Anan, Anindya Bhattacharjee, Ashir Intesher, Kaidul Islam, Abrar Assaeem Fuad, Utsab Saha, Hafiz Imtiaz,
- Abstract summary: We propose an ensemble-based approach that employs three different neural network architectures for deepfake detection.
We empirically demonstrate the effectiveness of these models in grouping real and fake images into cohesive clusters.
Our weighted ensemble model achieves an excellent accuracy of 93.23% on the validation dataset of the SP Cup 2025 competition.
- Score: 0.6700983301090583
- License:
- Abstract: Effective deepfake detection tools are becoming increasingly essential over the last few years due to the growing usage of deepfakes in unethical practices. There exists a diverse range of deepfake generation techniques, which makes it challenging to develop an accurate universal detection mechanism. The 2025 Signal Processing Cup (DFWild-Cup competition) provided a diverse dataset of deepfake images, which are generated from multiple deepfake image generators, for training machine learning model(s) to emphasize the generalization of deepfake detection. To this end, we proposed an ensemble-based approach that employs three different neural network architectures: a ResNet-34-based architecture, a data-efficient image transformer (DeiT), and an XceptionNet with Wavelet Transform to capture both local and global features of deepfakes. We visualize the specific regions that these models focus for classification using Grad-CAM, and empirically demonstrate the effectiveness of these models in grouping real and fake images into cohesive clusters using t-SNE plots. Individually, the ResNet-34 architecture has achieved 88.9% accuracy, whereas the Xception network and the DeiT architecture have achieved 87.76% and 89.32% accuracy, respectively. With these networks, our weighted ensemble model achieves an excellent accuracy of 93.23% on the validation dataset of the SP Cup 2025 competition. Finally, the confusion matrix and an Area Under the ROC curve of 97.44% further confirm the stability of our proposed method.
Related papers
- DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection [0.3818645814949463]
This report presents our approach for the IEEE SP Cup 2025: Deepfake Face Detection in the Wild (DFWild-Cup)
Our methodology employs advanced backbone models, including MaxViT, CoAtNet, and EVA-02, fine-tuned using supervised contrastive loss to enhance feature separation.
The proposed system addresses the challenges of detecting deepfakes in real-world conditions and achieves a commendable accuracy of 95.83% on the validation dataset.
arXiv Detail & Related papers (2025-01-28T04:46:50Z) - Wavelet-Driven Generalizable Framework for Deepfake Face Forgery Detection [0.0]
Wavelet-CLIP is a deepfake detection framework that integrates wavelet transforms with features derived from the ViT-L/14 architecture, pre-trained in the CLIP fashion.
Our method showcases outstanding performance, achieving an average AUC of 0.749 for cross-data generalization and 0.893 for robustness against unseen deepfakes.
arXiv Detail & Related papers (2024-09-26T21:16:51Z) - Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable.
Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology.
We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z) - Generalized Deepfakes Detection with Reconstructed-Blended Images and
Multi-scale Feature Reconstruction Network [14.749857283918157]
We present a blended-based detection approach that has robust applicability to unseen datasets.
Experiments demonstrated that this approach results in better performance in both cross-manipulation detection and cross-dataset detection on unseen data.
arXiv Detail & Related papers (2023-12-13T09:49:15Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Deepfake Detection with Deep Learning: Convolutional Neural Networks
versus Transformers [1.179179628317559]
We identify eight promising deep learning architectures, designed and developed our deepfake detection models and conducted experiments over well-established deepfake datasets.
We achieved 88.74%, 99.53%, 97.68%, 99.73% and 92.02% accuracy and 99.95%, 100%, 99.88%, 99.99% and 97.61% AUC, in the detection of FF++ 2020, Google DFD, Celeb-DF, Deeper Forensics and DFDC deepfakes.
arXiv Detail & Related papers (2023-04-07T15:33:09Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Road Segmentation for Remote Sensing Images using Adversarial Spatial
Pyramid Networks [28.32775611169636]
We introduce a new model to apply structured domain adaption for synthetic image generation and road segmentation.
A novel scale-wise architecture is introduced to learn from the multi-level feature maps and improve the semantics of the features.
Our model achieves state-of-the-art 78.86 IOU on the Massachusetts dataset with 14.89M parameters and 86.78B FLOPs, with 4x fewer FLOPs but higher accuracy (+3.47% IOU)
arXiv Detail & Related papers (2020-08-10T11:00:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.