Detecting Deepfakes with Metric Learning
- URL: http://arxiv.org/abs/2003.08645v1
- Date: Thu, 19 Mar 2020 09:44:23 GMT
- Title: Detecting Deepfakes with Metric Learning
- Authors: Akash Kumar and Arnav Bhavsar
- Abstract summary: We analyze several deep learning approaches in the context of deepfakes classification in high compression scenario.
We demonstrate that a proposed approach based on metric learning can be very effective in performing such a classification.
Our approach is especially helpful on social media platforms where data compression is inevitable.
- Score: 9.94524884861004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the arrival of several face-swapping applications such as FaceApp,
SnapChat, MixBooth, FaceBlender and many more, the authenticity of digital
media content is hanging on a very loose thread. On social media platforms,
videos are widely circulated often at a high compression factor. In this work,
we analyze several deep learning approaches in the context of deepfakes
classification in high compression scenario and demonstrate that a proposed
approach based on metric learning can be very effective in performing such a
classification. Using less number of frames per video to assess its realism,
the metric learning approach using a triplet network architecture proves to be
fruitful. It learns to enhance the feature space distance between the cluster
of real and fake videos embedding vectors. We validated our approaches on two
datasets to analyze the behavior in different environments. We achieved a
state-of-the-art AUC score of 99.2% on the Celeb-DF dataset and accuracy of
90.71% on a highly compressed Neural Texture dataset. Our approach is
especially helpful on social media platforms where data compression is
inevitable.
Related papers
- DMVC: Multi-Camera Video Compression Network aimed at Improving Deep Learning Accuracy [22.871591373774802]
We introduce a cutting-edge video compression framework tailored for the age of ubiquitous video data.
Unlike traditional compression methods that prioritize human visual perception, our innovative approach focuses on preserving semantic information critical for deep learning accuracy.
Based on a designed deep learning algorithms, it adeptly segregates essential information from redundancy, ensuring machine learning tasks are fed with data of the highest relevance.
arXiv Detail & Related papers (2024-10-24T03:29:57Z) - Building an Open-Vocabulary Video CLIP Model with Better Architectures,
Optimization and Data [102.0069667710562]
This paper presents Open-VCLIP++, a framework that adapts CLIP to a strong zero-shot video classifier.
We demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data.
Our approach is evaluated on three widely used action recognition datasets.
arXiv Detail & Related papers (2023-10-08T04:46:43Z) - Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for
Enhanced Video Forgery Detection [19.432851794777754]
We present a novel approach for the detection of deepfake videos using a pair of vision transformers pre-trained by a self-supervised masked autoencoding setup.
Our method consists of two distinct components, one of which focuses on learning spatial information from individual RGB frames of the video, while the other learns temporal consistency information from optical flow fields generated from consecutive frames.
arXiv Detail & Related papers (2023-06-12T05:49:23Z) - Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Combining Contrastive and Supervised Learning for Video Super-Resolution
Detection [0.0]
We propose a new upscaled-resolution-detection method based on learning of visual representations using contrastive and cross-entropy losses.
Our method effectively detects upscaling even in compressed videos and outperforms the state-of-the-art alternatives.
arXiv Detail & Related papers (2022-05-20T18:58:13Z) - Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval [55.088635195893325]
We propose the first quantized representation learning method for cross-view video retrieval, namely Hybrid Contrastive Quantization (HCQ)
HCQ learns both coarse-grained and fine-grained quantizations with transformers, which provide complementary understandings for texts and videos.
Experiments on three Web video benchmark datasets demonstrate that HCQ achieves competitive performance with state-of-the-art non-compressed retrieval methods.
arXiv Detail & Related papers (2022-02-07T18:04:10Z) - Face Forensics in the Wild [121.23154918448618]
We construct a novel large-scale dataset, called FFIW-10K, which comprises 10,000 high-quality forgery videos.
The manipulation procedure is fully automatic, controlled by a domain-adversarial quality assessment network.
In addition, we propose a novel algorithm to tackle the task of multi-person face forgery detection.
arXiv Detail & Related papers (2021-03-30T05:06:19Z) - Two-branch Recurrent Network for Isolating Deepfakes in Videos [17.59209853264258]
We present a method for deepfake detection based on a two-branch network structure.
One branch propagates the original information, while the other branch suppresses the face content.
Our two novel components show promising results on the FaceForensics++, Celeb-DF, and Facebook's DFDC preview benchmarks.
arXiv Detail & Related papers (2020-08-08T01:38:56Z) - Emotions Don't Lie: An Audio-Visual Deepfake Detection Method Using
Affective Cues [75.1731999380562]
We present a learning-based method for detecting real and fake deepfake multimedia content.
We extract and analyze the similarity between the two audio and visual modalities from within the same video.
We compare our approach with several SOTA deepfake detection methods and report per-video AUC of 84.4% on the DFDC and 96.6% on the DF-TIMIT datasets.
arXiv Detail & Related papers (2020-03-14T22:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.