Related papers: Voice-Face Homogeneity Tells Deepfake

Voice-Face Homogeneity Tells Deepfake

URL: http://arxiv.org/abs/2203.02195v1
Date: Fri, 4 Mar 2022 09:08:50 GMT
Title: Voice-Face Homogeneity Tells Deepfake
Authors: Harry Cheng and Yangyang Guo and Tianyi Wang and Qi Li and Tao Ye and Liqiang Nie
Abstract summary: Existing detection approaches contribute to exploring the specific artifacts in deepfake videos. We propose to perform the deepfake detection from an unexplored voice-face matching view. Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
Score: 56.334968246631725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Detecting forgery videos is highly desired due to the abuse of deepfake. Existing detection approaches contribute to exploring the specific artifacts in deepfake videos and fit well on certain data. However, the growing technique on these artifacts keeps challenging the robustness of traditional deepfake detectors. As a result, the development of generalizability of these approaches has reached a blockage. To address this issue, given the empirical results that the identities behind voices and faces are often mismatched in deepfake videos, and the voices and faces have homogeneity to some extent, in this paper, we propose to perform the deepfake detection from an unexplored voice-face matching view. To this end, a voice-face matching detection model is devised to measure the matching degree of these two on a generic audio-visual dataset. Thereafter, this model can be smoothly transferred to deepfake datasets without any fine-tuning, and the generalization across datasets is accordingly enhanced. We conduct extensive experiments over two widely exploited datasets - DFDC and FakeAVCeleb. Our model obtains significantly improved performance as compared to other state-of-the-art competitors and maintains favorable generalizability. The code has been released at https://github.com/xaCheng1996/VFD.

Related papers

Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies [29.81606633121959]
Lip-syncing deepfakes are one of the most challenging deepfakes to detect. We propose LIPINC-V2, a novel framework to detect lip-syncing deepfakes. Our model can successfully capture both short-term and long-term variations in mouth movement.
arXiv Detail & Related papers (2025-04-02T08:24:06Z)
DiffFake: Exposing Deepfakes using Differential Anomaly Detection [16.528373143163275]
We propose DiffFake, a novel deepfake detector that approaches the detection problem as an anomaly detection task. Specifically, DiffFake learns natural changes that occur between two facial images of the same person by leveraging a differential anomaly detection framework. We show that our method can match and sometimes even exceed the performance of state-of-the-art competitors.
arXiv Detail & Related papers (2025-02-22T14:50:53Z)
FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection [12.594436202557446]
This paper investigates why Vision Transformers (ViTs) exhibit a suboptimal performance when dealing with the detection of facial forgeries. We propose a deepfake detection framework called FakeFormer, which extends ViTs to enforce the extraction of subtle inconsistency-prone information. Experiments are conducted on diverse well-known datasets, including FF++, Celeb-DF, WildDeepfake, DFD, DFDCP, and DFDC.
arXiv Detail & Related papers (2024-10-29T11:36:49Z)
Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud. In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video. We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z)
Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization [3.9440964696313485]
In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity. Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat. We propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection.
arXiv Detail & Related papers (2024-08-02T18:45:01Z)
A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection [17.285669984798975]
This paper addresses the challenge of developing a robust audio-visual deepfake detection model. New generation algorithms are continually emerging, and these algorithms are not encountered during the development of detection methods. We propose a multi-stream fusion approach with one-class learning as a representation-level regularization technique.
arXiv Detail & Related papers (2024-06-20T10:33:15Z)
DF40: Toward Next-Generation Deepfake Detection [62.073997142001424]
existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset and testing them on other prevalent deepfake datasets. But can these stand-out "winners" be truly applied to tackle the myriad of realistic and diverse deepfakes lurking in the real world? We construct a highly diverse deepfake detection dataset called DF40, which comprises 40 distinct deepfake techniques.
arXiv Detail & Related papers (2024-06-19T12:35:02Z)
In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection Protocol [20.667392938528987]
We introduce the Rebalanced Deepfake Detection Protocol (RDDP) to stress-test detectors under balanced scenarios. We present ID-Miner, a detector that identifies the puppeteer behind the disguise by focusing on motion over artifacts or appearances.
arXiv Detail & Related papers (2024-05-01T12:48:13Z)
CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF) Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts. It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis [69.09526348527203]
Deep generative models have led to highly realistic media, known as deepfakes, that are commonly indistinguishable from real to human eyes. We propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection. We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios.
arXiv Detail & Related papers (2021-05-29T21:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.