Direct Motion Models for Assessing Generated Videos
- URL: http://arxiv.org/abs/2505.00209v1
- Date: Wed, 30 Apr 2025 22:34:52 GMT
- Title: Direct Motion Models for Assessing Generated Videos
- Authors: Kelsey Allen, Carl Doersch, Guangyao Zhou, Mohammed Suhail, Danny Driess, Ignacio Rocco, Yulia Rubanova, Thomas Kipf, Mehdi S. M. Sajjadi, Kevin Murphy, Joao Carreira, Sjoerd van Steenkiste,
- Abstract summary: A current limitation of video generative video models is that they generate plausible looking frames, but poor motion.<n>Here we go beyond FVD by developing a metric which better measures plausible object interactions and motion.<n>We show that using point tracks instead of pixel reconstruction or action recognition features results in a metric which is markedly more sensitive to temporal distortions in synthetic data.
- Score: 38.04485796547767
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A current limitation of video generative video models is that they generate plausible looking frames, but poor motion -- an issue that is not well captured by FVD and other popular methods for evaluating generated videos. Here we go beyond FVD by developing a metric which better measures plausible object interactions and motion. Our novel approach is based on auto-encoding point tracks and yields motion features that can be used to not only compare distributions of videos (as few as one generated and one ground truth, or as many as two datasets), but also for evaluating motion of single videos. We show that using point tracks instead of pixel reconstruction or action recognition features results in a metric which is markedly more sensitive to temporal distortions in synthetic data, and can predict human evaluations of temporal consistency and realism in generated videos obtained from open-source models better than a wide range of alternatives. We also show that by using a point track representation, we can spatiotemporally localize generative video inconsistencies, providing extra interpretability of generated video errors relative to prior work. An overview of the results and link to the code can be found on the project page: http://trajan-paper.github.io.
Related papers
- SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation [22.693060144042196]
Methods for image-to-video generation have achieved impressive, photo-realistic quality.
adjusting specific elements in generated videos, such as object motion or camera movement, is often a tedious process of trial and error.
Recent techniques address this issue by fine-tuning a pre-trained model.
We introduce a framework for controllable image-to-video generation that is self-guided.
arXiv Detail & Related papers (2024-11-07T18:56:11Z) - Fine-gained Zero-shot Video Sampling [21.42513407755273]
We propose a novel Zero-Shot video sampling algorithm, denoted as $mathcalZS2$.
$mathcalZS2$ is capable of directly sampling high-quality video clips without any training or optimization.
It achieves state-of-the-art performance in zero-shot video generation, occasionally outperforming recent supervised methods.
arXiv Detail & Related papers (2024-07-31T09:36:58Z) - Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos [13.368981834953981]
We propose Fr'echet Video Motion Distance metric, which focuses on evaluating motion consistency in video generation.
Specifically, we design explicit motion features based on key point tracking, and then measure the similarity between these features via the Fr'echet distance.
We carry out a large-scale human study, demonstrating that our metric effectively detects temporal noise and aligns better with human perceptions of generated video quality than existing metrics.
arXiv Detail & Related papers (2024-07-23T02:10:50Z) - VGMShield: Mitigating Misuse of Video Generative Models [7.963591895964269]
We introduce VGMShield: a set of three straightforward but pioneering mitigations through the lifecycle of fake video generation.
We first try to understand whether there is uniqueness in generated videos and whether we can differentiate them from real videos.
Then, we investigate the textittracing problem, which maps a fake video back to a model that generates it.
arXiv Detail & Related papers (2024-02-20T16:39:23Z) - STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models [6.855409699832414]
Video generative models struggle to generate even short video clips.
Current video evaluation metrics are simple adaptations of image metrics by switching the embeddings with video embedding networks.
We propose STREAM, a new video evaluation metric uniquely designed to independently evaluate spatial and temporal aspects.
arXiv Detail & Related papers (2024-01-30T08:18:20Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and
Prediction [93.26613503521664]
This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction.
We propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions.
Our model generates transition videos that ensure coherence and visual quality.
arXiv Detail & Related papers (2023-10-31T17:58:17Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - A Good Image Generator Is What You Need for High-Resolution Video
Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos.
We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z) - TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting [107.39743751292028]
TransMoMo is capable of transferring motion of a person in a source video realistically to another video of a target person.
We exploit invariance properties of three factors of variation including motion, structure, and view-angle.
We demonstrate the effectiveness of our method over the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-31T17:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.