Related papers: Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

URL: http://arxiv.org/abs/2510.08073v1
Date: Thu, 09 Oct 2025 11:00:35 GMT
Title: Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection
Authors: Shuhai Zhang, ZiHao Lian, Jiahao Yang, Daiyuan Li, Guoxuan Pang, Feng Liu, Bo Han, Shutao Li, Mingkui Tan,
Abstract summary: We propose an AI-driven video detection paradigm based on probability flow conservation principles.<n>We develop an NSG-based video detection method (NSG-VD) that computes the Mean Discrepancy (MMD) between NSG features of the test and real videos as a detection metric.
Score: 73.51855469884195
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI-generated videos have achieved near-perfect visual realism (e.g., Sora), urgently necessitating reliable detection mechanisms. However, detecting such videos faces significant challenges in modeling high-dimensional spatiotemporal dynamics and identifying subtle anomalies that violate physical laws. In this paper, we propose a physics-driven AI-generated video detection paradigm based on probability flow conservation principles. Specifically, we propose a statistic called Normalized Spatiotemporal Gradient (NSG), which quantifies the ratio of spatial probability gradients to temporal density changes, explicitly capturing deviations from natural video dynamics. Leveraging pre-trained diffusion models, we develop an NSG estimator through spatial gradients approximation and motion-aware temporal modeling without complex motion decomposition while preserving physical constraints. Building on this, we propose an NSG-based video detection method (NSG-VD) that computes the Maximum Mean Discrepancy (MMD) between NSG features of the test and real videos as a detection metric. Last, we derive an upper bound of NSG feature distances between real and generated videos, proving that generated videos exhibit amplified discrepancies due to distributional shifts. Extensive experiments confirm that NSG-VD outperforms state-of-the-art baselines by 16.00% in Recall and 10.75% in F1-Score, validating the superior performance of NSG-VD. The source code is available at https://github.com/ZSHsh98/NSG-VD.

Related papers

Learning Spatio-Temporal Feature Representations for Video-Based Gaze Estimation [50.05866669110754]
Video-based gaze estimation methods aim to capture the inherently temporal dynamics of human eye gaze from multiple image frames.<n>We propose the Spatio-Temporal Gaze Network (ST-Gaze), a model that combines a CNN backbone with dedicated channel attention and self-attention modules.<n>We show that ST-Gaze achieves state-of-the-art performance both with and without person-specific adaptation.
arXiv Detail & Related papers (2025-12-19T15:15:58Z)
Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors [8.077437139445603]
We introduce a scalable, model-a framework that assesses temporal behavior using motion vectors (MVs) extracted directly from compressed video streams.<n>We quantify realism by computing Kullback-Leibler, Jensen-Shannon, and Wasserstein divergences between MV statistics of real and generated videos.
arXiv Detail & Related papers (2025-11-17T20:47:06Z)
Anticipatory Fall Detection in Humans with Hybrid Directed Graph Neural Networks and Long Short-Term Memory [12.677218248209494]
We propose a hybrid model combining Dynamic Graph Neural Networks (DGNN) with Long Short-Term Memory (LSTM) networks to anticipate falls.<n>Our approach employs real-time skeletal features extracted from video sequences as input for the proposed model.<n>The LSTM-based network then predicts human movement in subsequent time steps, enabling early detection of falls.
arXiv Detail & Related papers (2025-09-01T12:56:31Z)
VDEGaussian: Video Diffusion Enhanced 4D Gaussian Splatting for Dynamic Urban Scenes Modeling [68.65587507038539]
We present a novel video diffusion-enhanced 4D Gaussian Splatting framework for dynamic urban scene modeling.<n>Our key insight is to distill robust, temporally consistent priors from a test-time adapted video diffusion model.<n>Our method significantly enhances dynamic modeling, especially for fast-moving objects, achieving an approximate PSNR gain of 2 dB.
arXiv Detail & Related papers (2025-08-04T07:24:05Z)
Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation [54.42523027597904]
We introduce a novel framework that integrates symbolic regression and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting.<n>Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories.
arXiv Detail & Related papers (2025-07-09T13:28:42Z)
AI-Generated Video Detection via Perceptual Straightening [9.008575690370895]
We propose ReStraV, a novel approach to distinguish natural from AI-generated videos.<n>Inspired by the "perceptual straightening" hypothesis, we quantify the temporal curvature and stepwise distance in the model's representation domain.<n>Our analysis shows that AI-generated videos exhibit significantly different curvature and distance patterns compared to real videos.
arXiv Detail & Related papers (2025-07-01T09:04:21Z)
Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems [49.11170948406405]
We propose an unsupervised method to estimate the physical parameters of known, continuous governing equations from single videos.<n>We take the field closer to reality by recording Delfys75: our own real-world dataset of 75 videos for five different types of dynamical systems.
arXiv Detail & Related papers (2024-10-02T09:44:54Z)
SEGNO: Generalizing Equivariant Graph Neural Networks with Physical Inductive Biases [66.61789780666727]
We show how the second-order continuity can be incorporated into GNNs while maintaining the equivariant property. We also offer theoretical insights into SEGNO, highlighting that it can learn a unique trajectory between adjacent states. Our model yields a significant improvement over the state-of-the-art baselines.
arXiv Detail & Related papers (2023-08-25T07:15:58Z)
NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
We propose a novel, robust and accelerated iteration that relies on two key elements. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively. We show that NAG-arity is competitive with state-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models.
arXiv Detail & Related papers (2022-09-29T16:54:53Z)
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD) We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z)
FOVQA: Blind Foveated Video Quality Assessment [1.4127304025810108]
We develop a no-reference (NR) foveated video quality assessment model, called FOVQA. It is based on new models of space-variant natural scene statistics (NSS) and natural video statistics (NVS) FOVQA achieves state-of-the-art (SOTA) performance on the new 2D LIVE-FBT-FCVR database.
arXiv Detail & Related papers (2021-06-24T21:38:22Z)
Robust Unsupervised Video Anomaly Detection by Multi-Path Frame Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design. Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.