CapST: An Enhanced and Lightweight Model Attribution Approach for
Synthetic Videos
- URL: http://arxiv.org/abs/2311.03782v3
- Date: Mon, 22 Jan 2024 14:52:14 GMT
- Title: CapST: An Enhanced and Lightweight Model Attribution Approach for
Synthetic Videos
- Authors: Wasim Ahmad, Yan-Tsung Peng, Yuan-Hao Chang, Gaddisa Olani Ganfure,
Sarwar Khan, Sahibzada Adil Shahzad
- Abstract summary: This paper investigates the model attribution problem of Deepfake videos from a recently proposed dataset, Deepfakes from Different Models (DFDM)
The dataset comprises 6,450 Deepfake videos generated by five distinct models with variations in encoder, decoder, intermediate layer, input resolution, and compression ratio.
Experimental results on the deepfake benchmark dataset (DFDM) demonstrate the efficacy of our proposed method, achieving up to a 4% improvement in accurately categorizing deepfake videos.
- Score: 9.209808258321559
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deepfake videos, generated through AI faceswapping techniques, have garnered
considerable attention due to their potential for powerful impersonation
attacks. While existing research primarily focuses on binary classification to
discern between real and fake videos, however determining the specific
generation model for a fake video is crucial for forensic investigation.
Addressing this gap, this paper investigates the model attribution problem of
Deepfake videos from a recently proposed dataset, Deepfakes from Different
Models (DFDM), derived from various Autoencoder models. The dataset comprises
6,450 Deepfake videos generated by five distinct models with variations in
encoder, decoder, intermediate layer, input resolution, and compression ratio.
This study formulates Deepfakes model attribution as a multiclass
classification task, proposing a segment of VGG19 as a feature extraction
backbone, known for its effectiveness in imagerelated tasks, while integrated a
Capsule Network with a Spatio-Temporal attention mechanism. The Capsule module
captures intricate hierarchies among features for robust identification of
deepfake attributes. Additionally, the video-level fusion technique leverages
temporal attention mechanisms to handle concatenated feature vectors,
capitalizing on inherent temporal dependencies in deepfake videos. By
aggregating insights across frames, our model gains a comprehensive
understanding of video content, resulting in more precise predictions.
Experimental results on the deepfake benchmark dataset (DFDM) demonstrate the
efficacy of our proposed method, achieving up to a 4% improvement in accurately
categorizing deepfake videos compared to baseline models while demanding fewer
computational resources.
Related papers
- VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos.
Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models.
We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z) - Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos [16.34393937800271]
generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities.
Recent works to combat Deepfakes videos have developed detectors that are highly accurate at identifying GAN-generated samples.
We propose a novel framework for detecting videos synthesized from multiple state-of-the-art (SOTA) generative models.
arXiv Detail & Related papers (2024-06-13T21:52:49Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - Video Infringement Detection via Feature Disentanglement and Mutual
Information Maximization [51.206398602941405]
We propose to disentangle an original high-dimensional feature into multiple sub-features.
On top of the disentangled sub-features, we learn an auxiliary feature to enhance the sub-features.
Our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset.
arXiv Detail & Related papers (2023-09-13T10:53:12Z) - Deepfake Video Detection Using Generative Convolutional Vision
Transformer [3.8297637120486496]
We propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection.
Our model combines ConvNeXt and Swin Transformer models for feature extraction.
By learning from the visual artifacts and latent data distribution, GenConViT achieves improved performance in detecting a wide range of deepfake videos.
arXiv Detail & Related papers (2023-07-13T19:27:40Z) - A Hybrid CNN-LSTM model for Video Deepfake Detection by Leveraging
Optical Flow Features [0.0]
Deepfakes are the synthesized digital media in order to create ultra-realistic fake videos to trick the spectator.
In this paper, we leveraged an optical flow based feature extraction approach to extract the temporal features, which are then fed to a hybrid model for classification.
The hybrid model provides effective performance on open source data-sets such as, DFDC, FF++ and Celeb-DF.
arXiv Detail & Related papers (2022-07-28T09:38:09Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - The Effectiveness of Temporal Dependency in Deepfake Video Detection [0.0]
This paper investigates whether temporal information can improve the deepfake performance of deep learning models.
We find that temporal dependency produces a statistically significant increase in performance classifying real images for the model.
arXiv Detail & Related papers (2022-05-13T14:39:25Z) - Model Attribution of Face-swap Deepfake Videos [39.771800841412414]
We first introduce a new dataset with DeepFakes from Different Models (DFDM) based on several Autoencoder models.
Specifically, five generation models with variations in encoder, decoder, intermediate layer, input resolution, and compression ratio have been used to generate a total of 6,450 Deepfake videos.
We take Deepfakes model attribution as a multiclass classification task and propose a spatial and temporal attention based method to explore the differences among Deepfakes.
arXiv Detail & Related papers (2022-02-25T20:05:18Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [86.31412529187243]
Few-shot video recognition aims at learning new actions with only very few labeled samples.
We propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net.
arXiv Detail & Related papers (2020-10-20T03:06:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.