SFANet: Spatial-Frequency Attention Network for Deepfake Detection
- URL: http://arxiv.org/abs/2510.04630v1
- Date: Mon, 06 Oct 2025 09:35:57 GMT
- Title: SFANet: Spatial-Frequency Attention Network for Deepfake Detection
- Authors: Vrushank Ahire, Aniruddh Muley, Shivam Zample, Siddharth Verma, Pranav Menon, Surbhi Madan, Abhinav Dhall,
- Abstract summary: We propose a novel ensemble framework to achieve better detection accuracy and robustness.<n>Our method introduces innovative data-splitting, sequential training, frequency splitting, patch-based attention, and face segmentation techniques.<n>Our model achieves state-of-the-art performance when tested on the DFWild-Cup dataset.
- Score: 6.387788094718588
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting manipulated media has now become a pressing issue with the recent rise of deepfakes. Most existing approaches fail to generalize across diverse datasets and generation techniques. We thus propose a novel ensemble framework, combining the strengths of transformer-based architectures, such as Swin Transformers and ViTs, and texture-based methods, to achieve better detection accuracy and robustness. Our method introduces innovative data-splitting, sequential training, frequency splitting, patch-based attention, and face segmentation techniques to handle dataset imbalances, enhance high-impact regions (e.g., eyes and mouth), and improve generalization. Our model achieves state-of-the-art performance when tested on the DFWild-Cup dataset, a diverse subset of eight deepfake datasets. The ensemble benefits from the complementarity of these approaches, with transformers excelling in global feature extraction and texturebased methods providing interpretability. This work demonstrates that hybrid models can effectively address the evolving challenges of deepfake detection, offering a robust solution for real-world applications.
Related papers
- Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z) - Robust AI-Generated Face Detection with Imbalanced Data [10.360215701635674]
Current deepfake detection techniques have evolved from CNN-based methods focused on local artifacts to more advanced approaches using vision transformers and multimodal models like CLIP.<n>Despite recent progress, state-of-the-art deepfake detectors still face major challenges in handling distribution shifts from emerging generative models.<n>We propose a framework that combines dynamic loss reweighting and ranking-based optimization, which achieves superior generalization and performance under imbalanced dataset conditions.
arXiv Detail & Related papers (2025-05-04T17:02:10Z) - HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection [4.908389661988192]
HFMF is a comprehensive two-stage deepfake detection framework.<n>It integrates vision Transformers and convolutional nets through a hierarchical feature fusion mechanism.<n>We demonstrate that our architecture achieves superior performance across diverse dataset benchmarks.
arXiv Detail & Related papers (2025-01-10T00:20:29Z) - Leveraging Mixture of Experts for Improved Speech Deepfake Detection [53.69740463004446]
Speech deepfakes pose a significant threat to personal security and content authenticity.
We introduce a novel approach for enhancing speech deepfake detection performance using a Mixture of Experts architecture.
arXiv Detail & Related papers (2024-09-24T13:24:03Z) - Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model [16.69101880602321]
We propose a novel side-network-based decoder for generalized video-based Deepfake detection.<n>We also introduce Facial Component Guidance (FCG) to enhance spatial learning generalizability.<n>Our approach demonstrates promising generalizability on challenging Deepfake datasets.
arXiv Detail & Related papers (2024-04-08T14:58:52Z) - Diffusion Deepfake [41.59597965760673]
Recent progress in generative AI, primarily through diffusion models, presents significant challenges for real-world deepfake detection.
The increased realism in image details, diverse content, and widespread accessibility to the general public complicates the identification of these sophisticated deepfakes.
This paper introduces two extensive deepfake datasets generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2024-04-02T02:17:50Z) - GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable.
Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology.
We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z) - CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF)
Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts.
It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Self-Supervised Graph Transformer for Deepfake Detection [1.8133635752982105]
Deepfake detection methods have shown promising results in recognizing forgeries within a given dataset.
Deepfake detection system must remain impartial to forgery types, appearance, and quality for guaranteed generalizable detection performance.
This study introduces a deepfake detection framework, leveraging a self-supervised pre-training model that delivers exceptional generalization ability.
arXiv Detail & Related papers (2023-07-27T17:22:41Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.