Related papers: Evolving from Single-modal to Multi-modal Facial Deepfake Detection: Progress and Challenges

Evolving from Single-modal to Multi-modal Facial Deepfake Detection: Progress and Challenges

URL: http://arxiv.org/abs/2406.06965v4
Date: Thu, 03 Apr 2025 07:47:44 GMT
Title: Evolving from Single-modal to Multi-modal Facial Deepfake Detection: Progress and Challenges
Authors: Ping Liu, Qiqi Tao, Joey Tianyi Zhou,
Abstract summary: This survey traces the evolution of deepfake detection from early single-modal methods to sophisticated multi-modal approaches.<n>We present a structured taxonomy of detection techniques and analyze the transition from GAN-based to diffusion model-driven deepfakes.
Score: 40.11614155244292
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As synthetic media, including video, audio, and text, become increasingly indistinguishable from real content, the risks of misinformation, identity fraud, and social manipulation escalate. This survey traces the evolution of deepfake detection from early single-modal methods to sophisticated multi-modal approaches that integrate audio-visual and text-visual cues. We present a structured taxonomy of detection techniques and analyze the transition from GAN-based to diffusion model-driven deepfakes, which introduce new challenges due to their heightened realism and robustness against detection. Unlike prior surveys that primarily focus on single-modal detection or earlier deepfake techniques, this work provides the most comprehensive study to date, encompassing the latest advancements in multi-modal deepfake detection, generalization challenges, proactive defense mechanisms, and emerging datasets specifically designed to support new interpretability and reasoning tasks. We further explore the role of Vision-Language Models (VLMs) and Multimodal Large Language Models (MLLMs) in strengthening detection robustness against increasingly sophisticated deepfake attacks. By systematically categorizing existing methods and identifying emerging research directions, this survey serves as a foundation for future advancements in combating AI-generated facial forgeries. A curated list of all related papers can be found at \href{https://github.com/qiqitao77/Comprehensive-Advances-in-Deepfake-Detection-Spanning-Diverse-Modalitie s}{https://github.com/qiqitao77/Awesome-Comprehensive-Deepfake-Detection}.

Related papers

Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM [51.91311158085973]
Methods for detecting AI-generated media have evolved rapidly. General-purpose detectors based on MLLMs integrate authenticity verification, explainability, and localization capabilities. Ethical and security considerations have emerged as critical global concerns.
arXiv Detail & Related papers (2025-02-07T12:18:20Z)
Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey [1.7811840395202345]
deepfakes (DFs) have been utilized for malicious purposes, such as individual impersonation, misinformation spreading, and artists style imitation. This survey offers researchers and practitioners a comprehensive resource for understanding the current landscape, methodological approaches, and promising future directions in this rapidly evolving field.
arXiv Detail & Related papers (2024-11-26T22:04:49Z)
Understanding Audiovisual Deepfake Detection: Techniques, Challenges, Human Factors and Perceptual Insights [49.81915942821647]
Deep Learning has been successfully applied in diverse fields, and its impact on deepfake detection is no exception. Deepfakes are fake yet realistic synthetic content that can be used deceitfully for political impersonation, phishing, slandering, or spreading misinformation. This paper aims to improve the effectiveness of deepfake detection strategies and guide future research in cybersecurity and media integrity.
arXiv Detail & Related papers (2024-11-12T09:02:11Z)
DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion [94.46904504076124]
Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content. Existing methods often struggle to generalize to unseen domains due to the diverse nature of facial manipulations. We introduce DiffusionFake, a novel framework that reverses the generative process of face forgeries to enhance the generalization of detection models.
arXiv Detail & Related papers (2024-10-06T06:22:43Z)
Deep Learning Technology for Face Forgery Detection: A Survey [17.519617618071003]
Deep learning has enabled the creation or manipulation of high-fidelity facial images and videos. This technology, also known as deepfake, has achieved dramatic progress and become increasingly popular in social media. To diminish the risks of deepfake, it is desirable to develop powerful forgery detection methods.
arXiv Detail & Related papers (2024-09-22T01:42:01Z)
Dynamic Analysis and Adaptive Discriminator for Fake News Detection [59.41431561403343]
We propose a Dynamic Analysis and Adaptive Discriminator (DAAD) approach for fake news detection. For knowledge-based methods, we introduce the Monte Carlo Tree Search algorithm to leverage the self-reflective capabilities of large language models. For semantic-based methods, we define four typical deceit patterns to reveal the mechanisms behind fake news creation.
arXiv Detail & Related papers (2024-08-20T14:13:54Z)
Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization [3.9440964696313485]
In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity. Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat. We propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection.
arXiv Detail & Related papers (2024-08-02T18:45:01Z)
Deepfake Media Forensics: State of the Art and Challenges Ahead [51.33414186878676]
AI-generated synthetic media, also called Deepfakes, have influenced so many domains, from entertainment to cybersecurity. Deepfake detection has become a vital area of research, focusing on identifying subtle inconsistencies and artifacts with machine learning techniques. This paper reviews the primary algorithms that address these challenges, examining their advantages, limitations, and future prospects.
arXiv Detail & Related papers (2024-08-01T08:57:47Z)
Conditioned Prompt-Optimization for Continual Deepfake Detection [11.634681724245933]
This paper introduces Prompt2Guard, a novel solution for photorealistic-free continual deepfake detection of images. We leverage a prediction ensembling technique with read-only prompts, mitigating the need for multiple forward passes. Our method exploits a text-prompt conditioning tailored to deepfake detection, which we demonstrate is beneficial in our setting.
arXiv Detail & Related papers (2024-07-31T12:22:57Z)
The Tug-of-War Between Deepfake Generation and Detection [4.62070292702111]
Multimodal generative models are rapidly evolving, leading to a surge in the generation of realistic video and audio. Deepfake videos, which can convincingly impersonate individuals, have particularly garnered attention due to their potential misuse. This survey paper examines the dual landscape of deepfake video generation and detection, emphasizing the need for effective countermeasures.
arXiv Detail & Related papers (2024-07-08T17:49:41Z)
PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset [7.952304417617302]
multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. To address this gap, we propose a novel, multilingual, and multimodal deepfake dataset: PolyGlotFake. It includes content in seven languages, created using a variety of cutting-edge and popular Text-to-Speech, voice cloning, and lip-sync technologies.
arXiv Detail & Related papers (2024-05-14T06:40:05Z)
Deepfake Generation and Detection: A Benchmark and Survey [134.19054491600832]
Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions. This survey comprehensively reviews the latest developments in deepfake generation and detection. We focus on researching four representative deepfake fields: face swapping, face reenactment, talking face generation, and facial attribute editing.
arXiv Detail & Related papers (2024-03-26T17:12:34Z)
Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation. We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs) We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z)
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward [2.15242029196761]
It is possible to generate deepfakes to disseminate disinformation, revenge porn, financial frauds, hoaxes, and to disrupt government functioning. No attempt has been made to review approaches for detection and generation of both audio and video deepfakes. This paper provides a comprehensive review and detailed analysis of existing tools and machine learning (ML) based approaches for deepfake generation.
arXiv Detail & Related papers (2021-02-25T18:26:50Z)
Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data [64.65952078807086]
Photorealistic image generation has reached a new level of quality due to the breakthroughs of generative adversarial networks (GANs) Yet, the dark side of such deepfakes, the malicious use of generated media, raises concerns about visual misinformation. We seek a proactive and sustainable solution on deepfake detection by introducing artificial fingerprints into the models.
arXiv Detail & Related papers (2020-07-16T16:49:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.