Related papers: Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

URL: http://arxiv.org/abs/2405.15343v1
Date: Fri, 24 May 2024 08:26:04 GMT
Title: Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features
Authors: Lichuan Ji, Yingqi Lin, Zhenhua Huang, Yan Han, Xiaogang Xu, Jiafei Wu, Chong Wang, Zhe Liu,
Abstract summary: We introduce an extensive video dataset designed specifically for AI-Generated Video Detection (GenVidDet) We also present the Dual-Branch 3D Transformer (DuB3D), an innovative and effective method for distinguishing between real and generated videos. DuB3D can distinguish between real and generated video content with 96.77% accuracy, and strong generalization capability even for unseen types.
Score: 21.583246378475856
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video authenticity. The main challenges lie in the dataset and neural classifier for training. Current datasets lack a varied and comprehensive repository of real and generated content for effective discrimination. In this paper, we first introduce an extensive video dataset designed specifically for AI-Generated Video Detection (GenVidDet). It includes over 2.66 M instances of both real and generated videos, varying in categories, frames per second, resolutions, and lengths. The comprehensiveness of GenVidDet enables the training of a generalizable video detector. We also present the Dual-Branch 3D Transformer (DuB3D), an innovative and effective method for distinguishing between real and generated videos, enhanced by incorporating motion information alongside visual appearance. DuB3D utilizes a dual-branch architecture that adaptively leverages and fuses raw spatio-temporal data and optical flow. We systematically explore the critical factors affecting detection performance, achieving the optimal configuration for DuB3D. Trained on GenVidDet, DuB3D can distinguish between real and generated video content with 96.77% accuracy, and strong generalization capability even for unseen types.

Related papers

Leveraging Pre-Trained Visual Models for AI-Generated Video Detection [54.88903878778194]
The field of video generation has advanced beyond DeepFakes, creating an urgent need for methods capable of detecting AI-generated videos with generic content.<n>We propose a novel approach that leverages pre-trained visual models to distinguish between real and generated videos.<n>Our method achieves high detection accuracy, above 90% on average, underscoring its effectiveness.
arXiv Detail & Related papers (2025-07-17T15:36:39Z)
BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos [63.03271511550633]
BrokenVideos is a benchmark dataset of 3,254 AI-generated videos with meticulously annotated, pixel-level masks highlighting regions of visual corruption.<n>Our experiments show that training state of the art artifact detection models and multi modal large language models (MLLMs) on BrokenVideos significantly improves their ability to localize corrupted regions.
arXiv Detail & Related papers (2025-06-25T03:30:04Z)
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning [58.70446237944036]
DAVID-X is the first dataset to pair AI-generated videos with detailed defect-level, temporal-spatial annotations and written rationales.<n>We present DAVID-XR1, a video-language model designed to deliver an interpretable chain of visual reasoning.<n>Our results highlight the promise of explainable detection methods for trustworthy identification of AI-generated video content.
arXiv Detail & Related papers (2025-06-13T13:39:53Z)
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation [47.46972260985436]
GenBuster-200K is a large-scale, high-quality AI-generated video dataset featuring 200K high-resolution video clips.<n>BusterX is a novel AI-generated video detection and explanation framework leveraging multimodal large language model (MLLM) and reinforcement learning.
arXiv Detail & Related papers (2025-05-19T02:06:43Z)
Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos [106.5804660736763]
Video information retrieval remains a fundamental approach for accessing video content. We build on the observation that retrieval models often favor AI-generated content in ad-hoc and image retrieval tasks. We investigate whether similar biases emerge in the context of challenging video retrieval.
arXiv Detail & Related papers (2025-02-11T07:43:47Z)
GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video [35.05198100139731]
We introduce GenVidBench, a challenging AI-generated video detection dataset with several key advantages. The dataset includes videos from 8 state-of-the-art AI video generators. It is analyzed from multiple dimensions and classified into various semantic categories based on their content.
arXiv Detail & Related papers (2025-01-20T08:58:56Z)
LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors [107.83398512719981]
Single-image 3D reconstruction remains a fundamental challenge in computer vision. Recent advances in Latent Video Diffusion Models offer promising 3D priors learned from large-scale video data. We propose LiftImage3D, a framework that effectively releases LVDMs' generative priors while ensuring 3D consistency.
arXiv Detail & Related papers (2024-12-12T18:58:42Z)
T-SVG: Text-Driven Stereoscopic Video Generation [87.62286959918566]
This paper introduces the Text-driven Stereoscopic Video Generation (T-SVG) system. It streamlines video generation by using text prompts to create reference videos. These videos are transformed into 3D point cloud sequences, which are rendered from two perspectives with subtle parallax differences.
arXiv Detail & Related papers (2024-12-12T14:48:46Z)
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning [42.86270268974854]
Temporal features can be complex and diverse. Spatiotemporal models often lean heavily on one type of artifact and ignore the other. Videos are naturally resource-intensive.
arXiv Detail & Related papers (2024-08-30T07:49:57Z)
What Matters in Detecting AI-Generated Videos like Sora? [51.05034165599385]
Gap between synthetic and real-world videos remains under-explored. In this study, we compare real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion. Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training.
arXiv Detail & Related papers (2024-06-27T23:03:58Z)
Splatter a Video: Video Gaussian Representation for Versatile Processing [48.9887736125712]
Video representation is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. We introduce a novel explicit 3D representation-video Gaussian representation -- that embeds a video into 3D Gaussians. It has been proven effective in numerous video processing tasks, including tracking, consistent video depth and feature refinement, motion and appearance editing, and stereoscopic video generation.
arXiv Detail & Related papers (2024-06-19T22:20:03Z)
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark [38.604684882464944]
We introduce the first AI-generated video detection dataset, GenVideo. It features a large volume of videos, including over one million AI-generated and real videos collected. We introduce a plug-and-play module, named Detail Mamba, to enhance detectors by identifying AI-generated videos.
arXiv Detail & Related papers (2024-05-30T05:36:12Z)
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation [35.52770785430601]
We propose a novel hybrid video autoencoder, called HVtemporalDM, which can capture intricate dependencies more effectively. The HVDM is trained by a hybrid video autoencoder which extracts a disentangled representation of the video. Our hybrid autoencoder provide a more comprehensive video latent enriching the generated videos with fine structures and details.
arXiv Detail & Related papers (2024-02-21T11:46:16Z)
AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z)
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation [55.36617538438858]
We propose a novel approach that strengthens the interaction between spatial and temporal perceptions. We curate a large-scale and open-source video dataset called HD-VG-130M.
arXiv Detail & Related papers (2023-05-18T11:06:15Z)
Deep Generative Models on 3D Representations: A Survey [81.73385191402419]
Generative models aim to learn the distribution of observed data by generating new instances. Recently, researchers started to shift focus from 2D to 3D space. representing 3D data poses significantly greater challenges.
arXiv Detail & Related papers (2022-10-27T17:59:50Z)
Autoencoding Video Latents for Adversarial Video Generation [0.0]
AVLAE is a two stream latent autoencoder where the video distribution is learned by adversarial training. We demonstrate that our approach learns to disentangle motion and appearance codes even without the explicit structural composition in the generator.
arXiv Detail & Related papers (2022-01-18T11:42:14Z)
Robust Pose Transfer with Dynamic Details using Neural Video Rendering [48.48929344349387]
We propose a neural video rendering framework coupled with an image-translation-based dynamic details generation network (D2G-Net) To be specific, a novel texture representation is presented to encode both the static and pose-varying appearance characteristics. We demonstrate that our neural human video is capable of achieving both clearer dynamic details and more robust performance even on short videos with only 2k - 4k frames.
arXiv Detail & Related papers (2021-06-27T03:40:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.