Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos
- URL: http://arxiv.org/abs/2406.09601v1
- Date: Thu, 13 Jun 2024 21:52:49 GMT
- Title: Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos
- Authors: Qingyuan Liu, Pengyuan Shi, Yun-Yun Tsai, Chengzhi Mao, Junfeng Yang,
- Abstract summary: generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities.
Recent works to combat Deepfakes videos have developed detectors that are highly accurate at identifying GAN-generated samples.
We propose a novel framework for detecting videos synthesized from multiple state-of-the-art (SOTA) generative models.
- Score: 16.34393937800271
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The impressive achievements of generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities. Recent works to combat Deepfakes videos have developed detectors that are highly accurate at identifying GAN-generated samples. However, the robustness of these detectors on diffusion-generated videos generated from video creation tools (e.g., SORA by OpenAI, Runway Gen-2, and Pika, etc.) is still unexplored. In this paper, we propose a novel framework for detecting videos synthesized from multiple state-of-the-art (SOTA) generative models, such as Stable Video Diffusion. We find that the SOTA methods for detecting diffusion-generated images lack robustness in identifying diffusion-generated videos. Our analysis reveals that the effectiveness of these detectors diminishes when applied to out-of-domain videos, primarily because they struggle to track the temporal features and dynamic variations between frames. To address the above-mentioned challenge, we collect a new benchmark video dataset for diffusion-generated videos using SOTA video creation tools. We extract representation within explicit knowledge from the diffusion model for video frames and train our detector with a CNN + LSTM architecture. The evaluation shows that our framework can well capture the temporal features between frames, achieves 93.7% detection accuracy for in-domain videos, and improves the accuracy of out-domain videos by up to 16 points.
Related papers
- What Matters in Detecting AI-Generated Videos like Sora? [51.05034165599385]
Gap between synthetic and real-world videos remains under-explored.
In this study, we compare real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion.
Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training.
arXiv Detail & Related papers (2024-06-27T23:03:58Z) - DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark [38.604684882464944]
We introduce the first AI-generated video detection dataset, GenVideo.
It features a large volume of videos, including over one million AI-generated and real videos collected.
We introduce a plug-and-play module, named Detail Mamba, to enhance detectors by identifying AI-generated videos.
arXiv Detail & Related papers (2024-05-30T05:36:12Z) - Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method [31.763312726582217]
generative model has made significant advancements in the creation of realistic videos, which causes security issues.
In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents.
By analyzing local and global temporal defects of current AI-generated videos, a novel detection framework is constructed to expose fake videos.
arXiv Detail & Related papers (2024-05-07T09:00:09Z) - CapST: An Enhanced and Lightweight Model Attribution Approach for
Synthetic Videos [9.209808258321559]
This paper investigates the model attribution problem of Deepfake videos from a recently proposed dataset, Deepfakes from Different Models (DFDM)
The dataset comprises 6,450 Deepfake videos generated by five distinct models with variations in encoder, decoder, intermediate layer, input resolution, and compression ratio.
Experimental results on the deepfake benchmark dataset (DFDM) demonstrate the efficacy of our proposed method, achieving up to a 4% improvement in accurately categorizing deepfake videos.
arXiv Detail & Related papers (2023-11-07T08:05:09Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - Anomaly detection in surveillance videos using transformer based
attention model [3.2968779106235586]
This research suggests using a weakly supervised strategy to avoid annotating anomalous segments in training videos.
The proposed framework is validated on real-world dataset i.e. ShanghaiTech Campus dataset.
arXiv Detail & Related papers (2022-06-03T12:19:39Z) - Generating Videos with Dynamics-aware Implicit Generative Adversarial
Networks [68.93429034530077]
We propose dynamics-aware implicit generative adversarial network (DIGAN) for video generation.
We show that DIGAN can be trained on 128 frame videos of 128x128 resolution, 80 frames longer than the 48 frames of the previous state-of-the-art method.
arXiv Detail & Related papers (2022-02-21T23:24:01Z) - Improving the Efficiency and Robustness of Deepfakes Detection through
Precise Geometric Features [13.033517345182728]
Deepfakes is a branch of malicious techniques that transplant a target face to the original one in videos.
Previous efforts for Deepfakes videos detection mainly focused on appearance features, which have a risk of being bypassed by sophisticated manipulation.
We propose an efficient and robust framework named LRNet for detecting Deepfakes videos through temporal modeling on precise geometric features.
arXiv Detail & Related papers (2021-04-09T16:57:55Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Single Shot Video Object Detector [215.06904478667337]
Single Shot Video Object Detector (SSVD) is a new architecture that novelly integrates feature aggregation into a one-stage detector for object detection in videos.
For $448 times 448$ input, SSVD achieves 79.2% mAP on ImageNet VID dataset.
arXiv Detail & Related papers (2020-07-07T15:36:26Z) - Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames.
We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning.
Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.