VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL
- URL: http://arxiv.org/abs/2510.02282v2
- Date: Mon, 06 Oct 2025 17:39:06 GMT
- Title: VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL
- Authors: Kyoungjun Park, Yifan Yang, Juheon Yi, Shicheng Zheng, Yifei Shen, Dongqi Han, Caihua Shan, Muhammad Muaz, Lili Qiu,
- Abstract summary: VidGuard-R1 is the first video authenticity detector that fine-tunes a multi-modal large language model.<n>Our model delivers both highly accurate judgments and insightful reasoning.
- Score: 30.581247383974482
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid advancement of AI-generated videos, there is an urgent need for effective detection tools to mitigate societal risks such as misinformation and reputational harm. In addition to accurate classification, it is essential that detection models provide interpretable explanations to ensure transparency for regulators and end users. To address these challenges, we introduce VidGuard-R1, the first video authenticity detector that fine-tunes a multi-modal large language model (MLLM) using group relative policy optimization (GRPO). Our model delivers both highly accurate judgments and insightful reasoning. We curate a challenging dataset of 140k real and AI-generated videos produced by state-of-the-art generation models, carefully designing the generation process to maximize discrimination difficulty. We then fine-tune Qwen-VL using GRPO with two specialized reward models that target temporal artifacts and generation complexity. Extensive experiments demonstrate that VidGuard-R1 achieves state-of-the-art zero-shot performance on existing benchmarks, with additional training pushing accuracy above 95%. Case studies further show that VidGuard-R1 produces precise and interpretable rationales behind its predictions. The code is publicly available at https://VidGuard-R1.github.io.
Related papers
- LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding [106.23494088118571]
LongVideo-R1 is a multimodal large language model (MLLM) agent for efficient video context navigation.<n>It infers the most informative video clip for subsequent processing.<n>The LongVideo-R1 agent is fine-tuned upon the Qwen-3-8B model through a two-stage paradigm.
arXiv Detail & Related papers (2026-02-24T13:49:47Z) - VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning [42.22791607763693]
VideoVeritas is a framework for fine-grained perception and fact-based reasoning.<n>Joint Perception Preference and Perception Pretext Reinforcement Learning is used.
arXiv Detail & Related papers (2026-02-09T16:00:01Z) - SAGA: Source Attribution of Generative AI Videos [23.217701516122048]
We introduce SAGA (Source Attribution of Generative AI videos), the first comprehensive framework to address the need for AI-generated video source attribution at a large scale.<n>It provides multi-granular attribution across five levels: authenticity, generation task (e.g., T2V/I2V), model version, development team, and the precise generator, offering far richer forensic insights.
arXiv Detail & Related papers (2025-11-16T23:39:54Z) - VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations [59.40631942092535]
Video temporal grounding (VTG) aims to locate precise segments in videos based on language queries.<n>Recent Multimodal Large Language Models (MLLMs) have shown promise in tackling VTG through reinforcement learning (RL)<n>We propose VideoTG-R1, a novel curriculum RL framework with reflected boundary annotations, enabling data-efficient training.
arXiv Detail & Related papers (2025-10-27T14:55:38Z) - Leveraging Pre-Trained Visual Models for AI-Generated Video Detection [54.88903878778194]
The field of video generation has advanced beyond DeepFakes, creating an urgent need for methods capable of detecting AI-generated videos with generic content.<n>We propose a novel approach that leverages pre-trained visual models to distinguish between real and generated videos.<n>Our method achieves high detection accuracy, above 90% on average, underscoring its effectiveness.
arXiv Detail & Related papers (2025-07-17T15:36:39Z) - DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning [58.70446237944036]
DAVID-X is the first dataset to pair AI-generated videos with detailed defect-level, temporal-spatial annotations and written rationales.<n>We present DAVID-XR1, a video-language model designed to deliver an interpretable chain of visual reasoning.<n>Our results highlight the promise of explainable detection methods for trustworthy identification of AI-generated video content.
arXiv Detail & Related papers (2025-06-13T13:39:53Z) - BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation [77.55074597806035]
GenBuster-200K is a large-scale, high-quality AI-generated video dataset featuring 200K high-resolution video clips.<n>BusterX is a novel AI-generated video detection and explanation framework leveraging multimodal large language model (MLLM) and reinforcement learning.
arXiv Detail & Related papers (2025-05-19T02:06:43Z) - Video-R1: Reinforcing Video Reasoning in MLLMs [30.13366332687375]
Video-R1 is the first attempt to systematically explore the R1 paradigm for incentivizing video reasoning.<n>We first propose the T-GRPO algorithm, which encourages models to utilize temporal information in videos for reasoning.<n>We have constructed two datasets: Video-R1-CoT-165k for SFT cold start and Video-R1-260k for RL training, both comprising image and video data.
arXiv Detail & Related papers (2025-03-27T17:59:51Z) - AI-Generated Video Detection via Spatio-Temporal Anomaly Learning [2.1210527985139227]
Users can easily create non-existent videos to spread false information.
A large-scale generated video dataset (GVD) is constructed as a benchmark for model training and evaluation.
arXiv Detail & Related papers (2024-03-25T11:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.