Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features
- URL: http://arxiv.org/abs/2405.15343v1
- Date: Fri, 24 May 2024 08:26:04 GMT
- Title: Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features
- Authors: Lichuan Ji, Yingqi Lin, Zhenhua Huang, Yan Han, Xiaogang Xu, Jiafei Wu, Chong Wang, Zhe Liu,
- Abstract summary: We introduce an extensive video dataset designed specifically for AI-Generated Video Detection (GenVidDet)
We also present the Dual-Branch 3D Transformer (DuB3D), an innovative and effective method for distinguishing between real and generated videos.
DuB3D can distinguish between real and generated video content with 96.77% accuracy, and strong generalization capability even for unseen types.
- Score: 21.583246378475856
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video authenticity. The main challenges lie in the dataset and neural classifier for training. Current datasets lack a varied and comprehensive repository of real and generated content for effective discrimination. In this paper, we first introduce an extensive video dataset designed specifically for AI-Generated Video Detection (GenVidDet). It includes over 2.66 M instances of both real and generated videos, varying in categories, frames per second, resolutions, and lengths. The comprehensiveness of GenVidDet enables the training of a generalizable video detector. We also present the Dual-Branch 3D Transformer (DuB3D), an innovative and effective method for distinguishing between real and generated videos, enhanced by incorporating motion information alongside visual appearance. DuB3D utilizes a dual-branch architecture that adaptively leverages and fuses raw spatio-temporal data and optical flow. We systematically explore the critical factors affecting detection performance, achieving the optimal configuration for DuB3D. Trained on GenVidDet, DuB3D can distinguish between real and generated video content with 96.77% accuracy, and strong generalization capability even for unseen types.
Related papers
- Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos [106.5804660736763]
Video information retrieval remains a fundamental approach for accessing video content.
We build on the observation that retrieval models often favor AI-generated content in ad-hoc and image retrieval tasks.
We investigate whether similar biases emerge in the context of challenging video retrieval.
arXiv Detail & Related papers (2025-02-11T07:43:47Z) - GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video [35.05198100139731]
We introduce GenVidBench, a challenging AI-generated video detection dataset with several key advantages.
The dataset includes videos from 8 state-of-the-art AI video generators.
It is analyzed from multiple dimensions and classified into various semantic categories based on their content.
arXiv Detail & Related papers (2025-01-20T08:58:56Z) - LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors [107.83398512719981]
Single-image 3D reconstruction remains a fundamental challenge in computer vision.
Recent advances in Latent Video Diffusion Models offer promising 3D priors learned from large-scale video data.
We propose LiftImage3D, a framework that effectively releases LVDMs' generative priors while ensuring 3D consistency.
arXiv Detail & Related papers (2024-12-12T18:58:42Z) - T-SVG: Text-Driven Stereoscopic Video Generation [87.62286959918566]
This paper introduces the Text-driven Stereoscopic Video Generation (T-SVG) system.
It streamlines video generation by using text prompts to create reference videos.
These videos are transformed into 3D point cloud sequences, which are rendered from two perspectives with subtle parallax differences.
arXiv Detail & Related papers (2024-12-12T14:48:46Z) - Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning [41.30923253467854]
Temporal features can be complex and diverse.
Spatiotemporal models often lean heavily on one type of artifact and ignore the other.
Videos are naturally resource-intensive.
arXiv Detail & Related papers (2024-08-30T07:49:57Z) - What Matters in Detecting AI-Generated Videos like Sora? [51.05034165599385]
Gap between synthetic and real-world videos remains under-explored.
In this study, we compare real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion.
Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training.
arXiv Detail & Related papers (2024-06-27T23:03:58Z) - DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark [38.604684882464944]
We introduce the first AI-generated video detection dataset, GenVideo.
It features a large volume of videos, including over one million AI-generated and real videos collected.
We introduce a plug-and-play module, named Detail Mamba, to enhance detectors by identifying AI-generated videos.
arXiv Detail & Related papers (2024-05-30T05:36:12Z) - Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation [35.52770785430601]
We propose a novel hybrid video autoencoder, called HVtemporalDM, which can capture intricate dependencies more effectively.
The HVDM is trained by a hybrid video autoencoder which extracts a disentangled representation of the video.
Our hybrid autoencoder provide a more comprehensive video latent enriching the generated videos with fine structures and details.
arXiv Detail & Related papers (2024-02-21T11:46:16Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Robust Pose Transfer with Dynamic Details using Neural Video Rendering [48.48929344349387]
We propose a neural video rendering framework coupled with an image-translation-based dynamic details generation network (D2G-Net)
To be specific, a novel texture representation is presented to encode both the static and pose-varying appearance characteristics.
We demonstrate that our neural human video is capable of achieving both clearer dynamic details and more robust performance even on short videos with only 2k - 4k frames.
arXiv Detail & Related papers (2021-06-27T03:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.