Related papers: DeCoF: Generated Video Detection via Frame Consistency: The First Benchmark Dataset

Related papers

Generative Video Matting [57.186684844156595]
Video matting has traditionally been limited by the lack of high-quality ground-truth data.<n>Most existing video matting datasets provide only human-annotated imperfect alpha and foreground annotations.<n>We introduce a novel video matting approach that can effectively leverage the rich priors from pre-trained video diffusion models.
arXiv Detail & Related papers (2025-08-11T12:18:55Z)
Leveraging Pre-Trained Visual Models for AI-Generated Video Detection [54.88903878778194]
The field of video generation has advanced beyond DeepFakes, creating an urgent need for methods capable of detecting AI-generated videos with generic content.<n>We propose a novel approach that leverages pre-trained visual models to distinguish between real and generated videos.<n>Our method achieves high detection accuracy, above 90% on average, underscoring its effectiveness.
arXiv Detail & Related papers (2025-07-17T15:36:39Z)
BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos [63.03271511550633]
BrokenVideos is a benchmark dataset of 3,254 AI-generated videos with meticulously annotated, pixel-level masks highlighting regions of visual corruption.<n>Our experiments show that training state of the art artifact detection models and multi modal large language models (MLLMs) on BrokenVideos significantly improves their ability to localize corrupted regions.
arXiv Detail & Related papers (2025-06-25T03:30:04Z)
Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation [18.402668470092294]
Synthetic video generation can produce very realistic high-resolution videos that are virtually indistinguishable from real ones.<n>Several video forensic detectors have been recently proposed, but they often exhibit poor generalization.<n>We introduce a novel data augmentation strategy based on the wavelet decomposition and replace specific frequency-related bands to drive the model to exploit more relevant forensic cues.<n>Our method achieves a significant accuracy improvement over state-of-the-art detectors and obtains excellent results even on very recent generative models.
arXiv Detail & Related papers (2025-06-20T07:36:59Z)
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning [58.70446237944036]
DAVID-X is the first dataset to pair AI-generated videos with detailed defect-level, temporal-spatial annotations and written rationales.<n>We present DAVID-XR1, a video-language model designed to deliver an interpretable chain of visual reasoning.<n>Our results highlight the promise of explainable detection methods for trustworthy identification of AI-generated video content.
arXiv Detail & Related papers (2025-06-13T13:39:53Z)
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation [47.46972260985436]
GenBuster-200K is a large-scale, high-quality AI-generated video dataset featuring 200K high-resolution video clips.<n>BusterX is a novel AI-generated video detection and explanation framework leveraging multimodal large language model (MLLM) and reinforcement learning.
arXiv Detail & Related papers (2025-05-19T02:06:43Z)
GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video [35.05198100139731]
We introduce GenVidBench, a challenging AI-generated video detection dataset with several key advantages. The dataset includes videos from 8 state-of-the-art AI video generators. It is analyzed from multiple dimensions and classified into various semantic categories based on their content.
arXiv Detail & Related papers (2025-01-20T08:58:56Z)
Generative Video Propagation [87.15843701018099]
Our framework, GenProp, encodes the original video with a selective content encoder and propagates the changes made to the first frame using an image-to-video generation model. Experiment results demonstrate the leading performance of our model in various video tasks.
arXiv Detail & Related papers (2024-12-27T17:42:29Z)
Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos [16.34393937800271]
generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities. Recent works to combat Deepfakes videos have developed detectors that are highly accurate at identifying GAN-generated samples. We propose a novel framework for detecting videos synthesized from multiple state-of-the-art (SOTA) generative models.
arXiv Detail & Related papers (2024-06-13T21:52:49Z)
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark [38.604684882464944]
We introduce the first AI-generated video detection dataset, GenVideo. It features a large volume of videos, including over one million AI-generated and real videos collected. We introduce a plug-and-play module, named Detail Mamba, to enhance detectors by identifying AI-generated videos.
arXiv Detail & Related papers (2024-05-30T05:36:12Z)
Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method [31.763312726582217]
generative model has made significant advancements in the creation of realistic videos, which causes security issues. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. By analyzing local and global temporal defects of current AI-generated videos, a novel detection framework is constructed to expose fake videos.
arXiv Detail & Related papers (2024-05-07T09:00:09Z)
AICL: Action In-Context Learning for Video Diffusion Model [124.39948693332552]
We propose AICL, which empowers the generative model with the ability to understand action information in reference videos. Extensive experiments demonstrate that AICL effectively captures the action and achieves state-of-the-art generation performance.
arXiv Detail & Related papers (2024-03-18T07:41:19Z)
VGMShield: Mitigating Misuse of Video Generative Models [7.963591895964269]
We introduce VGMShield: a set of three straightforward but pioneering mitigations through the lifecycle of fake video generation. We first try to understand whether there is uniqueness in generated videos and whether we can differentiate them from real videos. Then, we investigate the textittracing problem, which maps a fake video back to a model that generates it.
arXiv Detail & Related papers (2024-02-20T16:39:23Z)
Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model [0.0]
This work explores the performance of a large video understanding foundation model on the downstream task of human fall detection on untrimmed video. A method for temporal action localization that relies on a simple cutup of untrimmed videos is demonstrated. The results are promising for real-time application, and the falls are detected on video level with a state-of-the-art 0.96 F1 score on the HQFSD dataset.
arXiv Detail & Related papers (2024-01-29T16:37:00Z)
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries. Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality. We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z)
Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection [19.432851794777754]
We present a novel approach for the detection of deepfake videos using a pair of vision transformers pre-trained by a self-supervised masked autoencoding setup. Our method consists of two distinct components, one of which focuses on learning spatial information from individual RGB frames of the video, while the other learns temporal consistency information from optical flow fields generated from consecutive frames.
arXiv Detail & Related papers (2023-06-12T05:49:23Z)
Multi-object Video Generation from Single Frame Layouts [84.55806837855846]
We propose a video generative framework capable of synthesizing global scenes with local objects. Our framework is a non-trivial adaptation from image generation methods, and is new to this field. Our model has been evaluated on two widely-used video recognition benchmarks.
arXiv Detail & Related papers (2023-05-06T09:07:01Z)
Fast Non-Rigid Radiance Fields from Monocularized Data [66.74229489512683]
This paper proposes a new method for full 360deg inward-facing novel view synthesis of non-rigidly deforming scenes. At the core of our method are 1) An efficient deformation module that decouples the processing of spatial and temporal information for accelerated training and inference; and 2) A static module representing the canonical scene as a fast hash-encoded neural radiance field. In both cases, our method is significantly faster than previous methods, converging in less than 7 minutes and achieving real-time framerates at 1K resolution, while obtaining a higher visual accuracy for generated novel views.
arXiv Detail & Related papers (2022-12-02T18:51:10Z)
Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos. We propose to perform the deepfake detection from an unexplored voice-face matching view. Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z)
Efficient training for future video generation based on hierarchical disentangled representation of latent variables [66.94698064734372]
We propose a novel method for generating future prediction videos with less memory usage than the conventional methods. We achieve high-efficiency by training our method in two stages: (1) image reconstruction to encode video frames into latent variables, and (2) latent variable prediction to generate the future sequence. Our experiments show that the proposed method can efficiently generate future prediction videos, even for complex datasets that cannot be handled by previous methods.
arXiv Detail & Related papers (2021-06-07T10:43:23Z)
Few-Shot Learning for Video Object Detection in a Transfer-Learning Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection. We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z)
Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames. We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning. Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.