Related papers: HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection

HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection

URL: http://arxiv.org/abs/2501.05631v1
Date: Fri, 10 Jan 2025 00:20:29 GMT
Title: HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection
Authors: Anant Mehta, Bryant McArthur, Nagarjuna Kolloju, Zhengzhong Tu,
Abstract summary: HFMF is a comprehensive two-stage deepfake detection framework.<n>It integrates vision Transformers and convolutional nets through a hierarchical feature fusion mechanism.<n>We demonstrate that our architecture achieves superior performance across diverse dataset benchmarks.
Score: 4.908389661988192
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The rapid progress in deep generative models has led to the creation of incredibly realistic synthetic images that are becoming increasingly difficult to distinguish from real-world data. The widespread use of Variational Models, Diffusion Models, and Generative Adversarial Networks has made it easier to generate convincing fake images and videos, which poses significant challenges for detecting and mitigating the spread of misinformation. As a result, developing effective methods for detecting AI-generated fakes has become a pressing concern. In our research, we propose HFMF, a comprehensive two-stage deepfake detection framework that leverages both hierarchical cross-modal feature fusion and multi-stream feature extraction to enhance detection performance against imagery produced by state-of-the-art generative AI models. The first component of our approach integrates vision Transformers and convolutional nets through a hierarchical feature fusion mechanism. The second component of our framework combines object-level information and a fine-tuned convolutional net model. We then fuse the outputs from both components via an ensemble deep neural net, enabling robust classification performances. We demonstrate that our architecture achieves superior performance across diverse dataset benchmarks while maintaining calibration and interoperability.

Related papers

AdaptPrompt: Parameter-Efficient Adaptation of VLMs for Generalizable Deepfake Detection [7.76090543025328]
Recent advances in image generation have led to the widespread availability of highly realistic synthetic media, increasing the difficulty of reliable deepfake detection.<n>A key challenge is generalization, as detectors trained on a narrow class of generators often fail when confronted with unseen models.<n>We address the pressing need for generalizable detection by leveraging large vision-language models, specifically CLIP, to identify synthetic content across diverse generative techniques.
arXiv Detail & Related papers (2025-12-19T16:06:03Z)
DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models [43.86847047796023]
Current deepfake detection methods often depend on datasets with limited generation models and content diversity.<n>We present textbfDFBench, a large-scale DeepFake Benchmark featuring 540,000 images across real, AI-edited, and AI-generated content.<n>We propose textbfMoA-DF, Mixture of Agents for DeepFake detection, leveraging a combined probability strategy from multiple LMMs.
arXiv Detail & Related papers (2025-06-03T15:45:41Z)
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection [75.79507634008631]
We introduce So-Fake-Set, a social media-oriented dataset with over 2 million high-quality images, diverse generative sources, and imagery synthesized using 35 state-of-the-art generative models.<n>We present So-Fake-R1, an advanced vision-language framework that employs reinforcement learning for highly accurate forgery detection, precise localization, and explainable inference through interpretable visual rationales.
arXiv Detail & Related papers (2025-05-24T11:53:35Z)
Unpaired Deblurring via Decoupled Diffusion Model [55.21345354747609]
We propose UID-Diff, a generative-diffusion-based model designed to enhance deblurring performance on unknown domains.<n>We employ two Q-Formers as structural features and blur patterns extractors separately. The features extracted will be used for the supervised deblurring task on synthetic data and the unsupervised blur-transfer task.<n>Experiments on real-world datasets demonstrate that UID-Diff outperforms existing state-of-the-art methods in blur removal and structural preservation.
arXiv Detail & Related papers (2025-02-03T17:00:40Z)
HyperDet: Generalizable Detection of Synthesized Images by Generating and Merging A Mixture of Hyper LoRAs [17.88153857572688]
We introduce a novel and generalizable detection framework termed HyperDet. In this work, we propose a novel objective function that balances the pixel and semantic artifacts effectively. Our work paves a new way to establish generalizable domain-specific fake image detectors based on pretrained large vision models.
arXiv Detail & Related papers (2024-10-08T13:43:01Z)
Straight Through Gumbel Softmax Estimator based Bimodal Neural Architecture Search for Audio-Visual Deepfake Detection [6.367999777464464]
multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting. In this paper, we introduce the Straight-through Gumbel-Softmax framework, offering a comprehensive approach to search multimodal fusion model architectures. Experiments on the FakeAVCeleb and SWAN-DF datasets demonstrated an impressive AUC value 94.4% achieved with minimal model parameters.
arXiv Detail & Related papers (2024-06-19T09:26:22Z)
Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images [13.089550724738436]
Diffusion models (DMs) have revolutionized image generation, producing high-quality images with applications spanning various fields. Their ability to create hyper-realistic images poses significant challenges in distinguishing between real and synthetic content. This work introduces a robust detection framework that integrates image and text features extracted by CLIP model with a Multilayer Perceptron (MLP) classifier.
arXiv Detail & Related papers (2024-04-19T14:30:41Z)
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets. We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability. Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z)
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable. Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology. We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z)
A Dual Attentive Generative Adversarial Network for Remote Sensing Image Change Detection [6.906936669510404]
We propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks. The DAGAN framework has better performance with 85.01% mean IoU and 91.48% mean F1 score than advanced methods on the LEVIR dataset.
arXiv Detail & Related papers (2023-10-03T08:26:27Z)
Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z)
GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection [29.118321046339656]
We propose a framework to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for AI synthesized image detection. GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction.
arXiv Detail & Related papers (2022-11-16T02:03:20Z)
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations. Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived. We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z)
M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information. In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection. We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z)
High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis. Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis. Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.