Related papers: Latent Temporal Discrepancy as Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2V

Latent Temporal Discrepancy as Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2V

URL: http://arxiv.org/abs/2601.20504v1
Date: Wed, 28 Jan 2026 11:27:23 GMT
Title: Latent Temporal Discrepancy as Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2V
Authors: Meiqi Wu, Bingze Song, Ruimin Lin, Chen Zhu, Xiaokun Feng, Jiahong Wu, Xiangxiang Chu, Kaiqi Huang,
Abstract summary: We introduce Latent Temporal Discrepancy (LTD) as a motion prior to guide loss weighting.<n> LTD measures frame-to-frame variation in the latent space, assigning larger penalties to regions with higher discrepancy while maintaining regular optimization for stable regions.<n>Experiments on the general benchmark VBench and the motion-focused VMBench show consistent gains, with our method outperforming strong baselines by 3.31% on VBench and 3.58% on VMBench.
Score: 36.37746089367601
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video generation models have achieved notable progress in static scenarios, yet their performance in motion video generation remains limited, with quality degrading under drastic dynamic changes. This is due to noise disrupting temporal coherence and increasing the difficulty of learning dynamic regions. {Unfortunately, existing diffusion models rely on static loss for all scenarios, constraining their ability to capture complex dynamics.} To address this issue, we introduce Latent Temporal Discrepancy (LTD) as a motion prior to guide loss weighting. LTD measures frame-to-frame variation in the latent space, assigning larger penalties to regions with higher discrepancy while maintaining regular optimization for stable regions. This motion-aware strategy stabilizes training and enables the model to better reconstruct high-frequency dynamics. Extensive experiments on the general benchmark VBench and the motion-focused VMBench show consistent gains, with our method outperforming strong baselines by 3.31% on VBench and 3.58% on VMBench, achieving significant improvements in motion quality.

Related papers

SharpTimeGS: Sharp and Stable Dynamic Gaussian Splatting via Lifespan Modulation [44.0897839648633]
We present a lifespan-aware 4D Gaussian framework that achieves temporally adaptive modeling of both static and dynamic regions.<n>Our method achieves state-of-the-art performance while supporting real-time rendering up to 4K resolution.
arXiv Detail & Related papers (2026-02-03T01:50:59Z)
Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics [51.85385061275941]
Molecular dynamics (MD) simulations remain the gold standard for studying protein dynamics.<n>Recent generative models have shown promise in accelerating simulations, yet they struggle with long-horizon generation.<n>We present STAR-MD, a scalable diffusion model that generates physically plausible protein trajectories over micro-scale timescales.
arXiv Detail & Related papers (2026-02-02T14:13:28Z)
4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos [52.89084603734664]
We present 4D3R, a pose-free dynamic neural rendering framework that decouples static and dynamic components through a two-stage approach.<n>Our approach achieves up to 1.8dB PSNR improvement over state-of-the-art methods.
arXiv Detail & Related papers (2025-11-07T13:25:50Z)
Towards Robust and Generalizable Continuous Space-Time Video Super-Resolution with Events [71.2439653098351]
Continuous space-time video super-STVSR has garnered increasing interest for its capability to reconstruct high-resolution and high-frame-rate videos at arbitrary temporal scales.<n>We present EvEnhancer, a novel approach that marries unique properties of high temporal and high dynamic range encapsulated in event streams.<n>Our method achieves state-of-the-art performance on both synthetic and real-world datasets, while maintaining generalizability at OOD scales.
arXiv Detail & Related papers (2025-10-04T15:23:07Z)
From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting [26.57713792657793]
We propose a motion-adaptive framework that aligns control density with motion complexity.<n>We show significant improvements in reconstruction quality and efficiency over existing state-of-the-art methods.
arXiv Detail & Related papers (2025-10-03T05:33:58Z)
PMR: Physical Model-Driven Multi-Stage Restoration of Turbulent Dynamic Videos [9.48544376032391]
We introduce a Dynamic Efficiency Index ($DEI$), which combines turbulence intensity, optical flow, and proportions of dynamic regions to accurately quantify video dynamic intensity under varying turbulence conditions.<n>We also propose a Physical Model-Driven Multi-Stage Video Restoration ($PMR$) framework that consists of three stages: textbfde-tilting for geometric stabilization, textbfmotion segmentation enhancement, and textbfde-blurring for quality restoration.<n>$PMR$ employs lightweight backbones and stage-wise joint training to ensure both efficiency and high restoration quality.
arXiv Detail & Related papers (2025-08-01T08:06:41Z)
HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene [24.789092424634536]
We propose HAIF-GS, a unified framework that enables structured and consistent dynamic modeling through sparse anchor-driven deformation.<n>We show that HAIF-GS significantly outperforms prior dynamic 3DGS methods in rendering quality, temporal coherence, and reconstruction efficiency.
arXiv Detail & Related papers (2025-06-11T08:45:08Z)
Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction [50.873820265165975]
We introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for dynamic scene reconstruction.<n>We propose a GS-Threshold Joint Modeling strategy, creating a mutually reinforcing process that greatly improves both 3D reconstruction and threshold modeling.<n>We contribute the first event-inclusive 4D benchmark with synthetic and real-world dynamic scenes, on which our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-11-25T08:23:38Z)
Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering [49.36767999382054]
We present a unified representation model, called Periodic Vibration Gaussian (PVG)<n>PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation.<n>PVG exhibits 900-fold acceleration in rendering over the best alternative.
arXiv Detail & Related papers (2023-11-30T13:53:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.