Related papers: Physics-Guided Motion Loss for Video Generation Model

Related papers

Motion Attribution for Video Generation [97.2515042185441]
We present Motive, a motion-centric, gradient-based data attribution framework.<n>We use it to study which fine-tuning clips improve or degrade temporal dynamics.<n>To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models.
arXiv Detail & Related papers (2026-01-13T18:59:09Z)
Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation [76.04880323498598]
We introduce an algorithm to distill structure-preserving motion priors from an autoregressive video tracking model (SAM2) into a bidirectional video diffusion model (CogVideoX)<n>Experiments on VBench and in human studies show that SAM2VideoX delivers consistent gains.
arXiv Detail & Related papers (2025-12-12T18:56:35Z)
MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training [46.09617860476419]
Video diffusion models achieve strong frame-level fidelity but struggle with motion coherence, dynamics and realism.<n>We propose MoGAN, a motion-centric post-training framework that improves motion realism without reward models or human preference data.
arXiv Detail & Related papers (2025-11-26T17:09:03Z)
Real-Time Motion-Controllable Autoregressive Video Diffusion [79.32730467857535]
We propose AR-Drag, the first RL-enhanced few-step AR video diffusion model for real-time image-to-video generation with diverse motion control.<n>We first fine-tune a base I2V model to support basic motion control, then further improve it via reinforcement with a trajectory-based reward model.<n>Our design preserves the Markov property through a Self-Rollout learning mechanism and accelerates training by selectively denoising steps.
arXiv Detail & Related papers (2025-10-09T12:17:11Z)
Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection [73.51855469884195]
We propose an AI-driven video detection paradigm based on probability flow conservation principles.<n>We develop an NSG-based video detection method (NSG-VD) that computes the Mean Discrepancy (MMD) between NSG features of the test and real videos as a detection metric.
arXiv Detail & Related papers (2025-10-09T11:00:35Z)
Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation [54.42523027597904]
We introduce a novel framework that integrates symbolic regression and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting.<n>Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories.
arXiv Detail & Related papers (2025-07-09T13:28:42Z)
SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation [56.90807453045657]
SynMotion is a motion-customized video generation model that jointly leverages semantic guidance and visual adaptation.<n>At the semantic level, we introduce the dual-em semantic comprehension mechanism which disentangles subject and motion representations.<n>At the visual level, we integrate efficient motion adapters into a pre-trained video generation model to enhance motion fidelity and temporal coherence.
arXiv Detail & Related papers (2025-06-30T10:09:32Z)
MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM [14.522189177415724]
MAGIC is a training-free framework for single-image physical property inference and dynamic generation.<n>Our framework generates motion-rich videos from a static image and closes the visual-to-physical gap through a confidence-driven feedback loop.<n> Experiments show that MAGIC outperforms existing physics-aware generative methods in inference accuracy and achieves greater temporal coherence.
arXiv Detail & Related papers (2025-05-22T09:40:34Z)
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism [73.38167494118746]
We propose a framework to improve the realism of motion in generated videos.<n>We advocate for the incorporation of a retrieval mechanism during the generation phase.<n>Our pipeline is designed to apply to any text-to-video diffusion model.
arXiv Detail & Related papers (2025-04-09T08:14:05Z)
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators.<n>VideoJAM achieves state-of-the-art performance in motion coherence.<n>These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z)
PhysMotion: Physics-Grounded Dynamics From a Single Image [24.096925413047217]
We introduce PhysMotion, a novel framework that leverages principled physics-based simulations to guide intermediate 3D representations generated from a single image and input conditions.<n>Our approach addresses the limitations of traditional data-driven generative models and result in more consistent physically plausible motions.
arXiv Detail & Related papers (2024-11-26T07:59:11Z)
Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos [6.093379844890164]
We propose a novel method to selectively incorporate the physics models with the kinematics observations in an online setting.<n>A recurrent neural network is introduced to realize a Kalman filter that attentively balances the kinematics input and simulated motion.<n>The proposed approach excels in the physics-based human pose estimation task and demonstrates the physical plausibility of the predictive dynamics.
arXiv Detail & Related papers (2024-10-10T10:24:59Z)
Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.<n> SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.<n>Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z)
Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos. We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique. We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z)
Physics-Guided Human Motion Capture with Pose Probability Modeling [35.159506668475565]
Existing solutions always adopt kinematic results as reference motions, and the physics is treated as a post-processing module. We employ physics as denoising guidance in the reverse diffusion process to reconstruct human motion from a modeled pose probability distribution. With several iterations, the physics-based tracking and kinematic denoising promote each other to generate a physically plausible human motion.
arXiv Detail & Related papers (2023-08-19T05:28:03Z)
PhysDiff: Physics-Guided Human Motion Diffusion Model [101.1823574561535]
Existing motion diffusion models largely disregard the laws of physics in the diffusion process. PhysDiff incorporates physical constraints into the diffusion process. Our approach achieves state-of-the-art motion quality and improves physical plausibility drastically.
arXiv Detail & Related papers (2022-12-05T18:59:52Z)
Continuous-Time Video Generation via Learning Motion Dynamics with Neural ODE [26.13198266911874]
We propose a novel video generation approach that learns separate distributions for motion and appearance. We employ a two-stage approach where the first stage converts a noise vector to a sequence of keypoints in arbitrary frame rates, and the second stage synthesizes videos based on the given keypoints sequence and the appearance noise vector.
arXiv Detail & Related papers (2021-12-21T03:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.