Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning
- URL: http://arxiv.org/abs/2512.15693v1
- Date: Wed, 17 Dec 2025 18:48:26 GMT
- Title: Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning
- Authors: Yifei Li, Wenzhao Zheng, Yanran Zhang, Runze Sun, Yu Zheng, Lei Chen, Jie Zhou, Jiwen Lu,
- Abstract summary: We present Skyra, a specialized large language model (MLLM) that identifies human-perceivable visual artifacts in AI-generated videos.<n>To support this objective, we construct ViF-CoT-4K for Supervised Fine-Tuning (SFT), which represents the first large-scale AI-generated video dataset with fine-grained human annotations.<n>We then develop a two-stage training strategy that systematically enhances our model's artifact's-temporal perception, explanation capability, and detection accuracy.
- Score: 66.51617619673587
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The misuse of AI-driven video generation technologies has raised serious social concerns, highlighting the urgent need for reliable AI-generated video detectors. However, most existing methods are limited to binary classification and lack the necessary explanations for human interpretation. In this paper, we present Skyra, a specialized multimodal large language model (MLLM) that identifies human-perceivable visual artifacts in AI-generated videos and leverages them as grounded evidence for both detection and explanation. To support this objective, we construct ViF-CoT-4K for Supervised Fine-Tuning (SFT), which represents the first large-scale AI-generated video artifact dataset with fine-grained human annotations. We then develop a two-stage training strategy that systematically enhances our model's spatio-temporal artifact perception, explanation capability, and detection accuracy. To comprehensively evaluate Skyra, we introduce ViF-Bench, a benchmark comprising 3K high-quality samples generated by over ten state-of-the-art video generators. Extensive experiments demonstrate that Skyra surpasses existing methods across multiple benchmarks, while our evaluation yields valuable insights for advancing explainable AI-generated video detection.
Related papers
- Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs [92.02382309654263]
We introduce DeeptraceReward, a benchmark that annotates human-perceived fake traces for video generation reward.<n>The dataset comprises 4.3K detailed annotations across 3.3 high-quality generated videos.<n>We consolidate these annotations into 9 major categories of deepfake traces that lead humans to identify a video as AI-generated.
arXiv Detail & Related papers (2025-09-26T17:59:54Z) - D3: Training-Free AI-Generated Video Detection Using Second-Order Features [17.253600093886277]
Detection by Difference of Differences (D3) is a novel training-free detection method for synthetic videos.<n>We validate the superiority of our D3 on 4 open-source datasets.
arXiv Detail & Related papers (2025-08-01T15:17:51Z) - Leveraging Pre-Trained Visual Models for AI-Generated Video Detection [54.88903878778194]
The field of video generation has advanced beyond DeepFakes, creating an urgent need for methods capable of detecting AI-generated videos with generic content.<n>We propose a novel approach that leverages pre-trained visual models to distinguish between real and generated videos.<n>Our method achieves high detection accuracy, above 90% on average, underscoring its effectiveness.
arXiv Detail & Related papers (2025-07-17T15:36:39Z) - DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning [58.70446237944036]
DAVID-X is the first dataset to pair AI-generated videos with detailed defect-level, temporal-spatial annotations and written rationales.<n>We present DAVID-XR1, a video-language model designed to deliver an interpretable chain of visual reasoning.<n>Our results highlight the promise of explainable detection methods for trustworthy identification of AI-generated video content.
arXiv Detail & Related papers (2025-06-13T13:39:53Z) - GenWorld: Towards Detecting AI-generated Real-world Simulation Videos [79.98542193919957]
GenWorld is a large-scale, high-quality, and real-world simulation dataset for AI-generated video detection.<n>We propose a model, SpannDetector, to leverage multi-view consistency as a strong criterion for real-world AI-generated video detection.
arXiv Detail & Related papers (2025-06-12T17:59:33Z) - BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation [77.55074597806035]
GenBuster-200K is a large-scale, high-quality AI-generated video dataset featuring 200K high-resolution video clips.<n>BusterX is a novel AI-generated video detection and explanation framework leveraging multimodal large language model (MLLM) and reinforcement learning.
arXiv Detail & Related papers (2025-05-19T02:06:43Z) - AI-Generated Video Detection via Spatio-Temporal Anomaly Learning [2.1210527985139227]
Users can easily create non-existent videos to spread false information.
A large-scale generated video dataset (GVD) is constructed as a benchmark for model training and evaluation.
arXiv Detail & Related papers (2024-03-25T11:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.