Exposing Lip-syncing Deepfakes from Mouth Inconsistencies
- URL: http://arxiv.org/abs/2401.10113v2
- Date: Mon, 3 Jun 2024 21:29:28 GMT
- Title: Exposing Lip-syncing Deepfakes from Mouth Inconsistencies
- Authors: Soumyya Kanti Datta, Shan Jia, Siwei Lyu,
- Abstract summary: A lip-syncing deepfake is a digitally manipulated video in which a person's lip movements are created convincingly using AI models to match altered or entirely new audio.
In this paper, we describe a novel approach, LIP-syncing detection based on mouth INConsistency (LIPINC) for lip-syncing deepfake detection.
- Score: 29.81606633121959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A lip-syncing deepfake is a digitally manipulated video in which a person's lip movements are created convincingly using AI models to match altered or entirely new audio. Lip-syncing deepfakes are a dangerous type of deepfakes as the artifacts are limited to the lip region and more difficult to discern. In this paper, we describe a novel approach, LIP-syncing detection based on mouth INConsistency (LIPINC), for lip-syncing deepfake detection by identifying temporal inconsistencies in the mouth region. These inconsistencies are seen in the adjacent frames and throughout the video. Our model can successfully capture these irregularities and outperforms the state-of-the-art methods on several benchmark deepfake datasets. Code is available at https://github.com/skrantidatta/LIPINC
Related papers
- Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud.
In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video.
We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z) - Shaking the Fake: Detecting Deepfake Videos in Real Time via Active Probes [3.6308756891251392]
Real-time deepfake, a type of generative AI, is capable of "creating" non-existing contents (e.g., swapping one's face with another) in a video.
It has been misused to produce deepfake videos for malicious purposes, including financial scams and political misinformation.
We propose SFake, a new real-time deepfake detection method that exploits deepfake models' inability to adapt to physical interference.
arXiv Detail & Related papers (2024-09-17T04:58:30Z) - Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes [9.993053682230935]
Lip-forgery videos present a formidable challenge to existing DeepFake detection methods.
We propose a novel approach dedicated to lip-forgery identification that exploits the inconsistency between lip movements and audio signals.
Our approach gives an average accuracy of more than 95.3% in spotting lip-syncing videos.
arXiv Detail & Related papers (2024-01-28T14:22:11Z) - Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in
Transformers [91.00397473678088]
Previous studies have explored generating accurately lip-synced talking faces for arbitrary targets given audio conditions.
We propose the Audio-Visual Context-Aware Transformer (AV-CAT) framework, which produces accurate lip-sync with photo-realistic quality.
Our model can generate high-fidelity lip-synced results for arbitrary subjects.
arXiv Detail & Related papers (2022-12-09T16:32:46Z) - Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos.
We propose to perform the deepfake detection from an unexplored voice-face matching view.
Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z) - Watch Those Words: Video Falsification Detection Using Word-Conditioned
Facial Motion [82.06128362686445]
We propose a multi-modal semantic forensic approach to handle both cheapfakes and visually persuasive deepfakes.
We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others.
Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation.
arXiv Detail & Related papers (2021-12-21T01:57:04Z) - Pose-Controllable Talking Face Generation by Implicitly Modularized
Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces.
We operate on raw face images, using only a single photo as an identity reference.
Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z) - Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery
Detection [118.37239586697139]
LipForensics is a detection approach capable of both generalising manipulations and withstanding various distortions.
It consists in first pretraining a-temporal network to perform visual speech recognition (lipreading)
A temporal network is subsequently finetuned on fixed mouth embeddings of real and forged data in order to detect fake videos based on mouth movements without over-fitting to low-level, manipulation-specific artefacts.
arXiv Detail & Related papers (2020-12-14T15:53:56Z) - A Lip Sync Expert Is All You Need for Speech to Lip Generation In The
Wild [37.37319356008348]
lip-syncing a talking face video of an arbitrary identity to match a target speech segment.
We identify key reasons pertaining to this and hence resolve them by learning from a powerful lip-sync discriminator.
We propose new, rigorous evaluation benchmarks and metrics to accurately measure lip synchronization in unconstrained videos.
arXiv Detail & Related papers (2020-08-23T11:01:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.