Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis
- URL: http://arxiv.org/abs/2505.17333v2
- Date: Tue, 24 Jun 2025 19:43:16 GMT
- Title: Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis
- Authors: Xin You, Minghui Zhang, Hanxiao Zhang, Jie Yang, Nassir Navab,
- Abstract summary: Existing methods cannot simulate temporal motions unless high-dose imaging scans including starting and ending frames exist simultaneously.<n>We pioneeringly simulate the regular motion process via the image-to-video framework, which animates with the first frame to forecast future frames of a given length.<n>Our approach simulates 4D videos along the intrinsic motion trajectory, rivaling other competitive methods on perceptual similarity and temporal consistency.
- Score: 43.47331808314336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal modeling on regular respiration-induced motions is crucial to image-guided clinical applications. Existing methods cannot simulate temporal motions unless high-dose imaging scans including starting and ending frames exist simultaneously. However, in the preoperative data acquisition stage, the slight movement of patients may result in dynamic backgrounds between the first and last frames in a respiratory period. This additional deviation can hardly be removed by image registration, thus affecting the temporal modeling. To address that limitation, we pioneeringly simulate the regular motion process via the image-to-video (I2V) synthesis framework, which animates with the first frame to forecast future frames of a given length. Besides, to promote the temporal consistency of animated videos, we devise the Temporal Differential Diffusion Model to generate temporal differential fields, which measure the relative differential representations between adjacent frames. The prompt attention layer is devised for fine-grained differential fields, and the field augmented layer is adopted to better interact these fields with the I2V framework, promoting more accurate temporal variation of synthesized videos. Extensive results on ACDC cardiac and 4D Lung datasets reveal that our approach simulates 4D videos along the intrinsic motion trajectory, rivaling other competitive methods on perceptual similarity and temporal consistency. Codes will be available soon.
Related papers
- FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation [51.110607281391154]
FlowMo is a training-free guidance method for enhancing motion coherence in text-to-video models.<n>It estimates motion coherence by measuring the patch-wise variance across the temporal dimension and guides the model to reduce this variance dynamically during sampling.
arXiv Detail & Related papers (2025-06-01T19:55:33Z) - Generalizable Implicit Motion Modeling for Video Frame Interpolation [51.966062283735596]
Motion is critical in flow-based Video Frame Interpolation (VFI)<n>We introduce General Implicit Motion Modeling (IMM), a novel and effective approach to motion modeling VFI.<n>Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion.
arXiv Detail & Related papers (2024-07-11T17:13:15Z) - CPT-Interp: Continuous sPatial and Temporal Motion Modeling for 4D Medical Image Interpolation [22.886841531680567]
Motion information from 4D medical imaging offers critical insights into dynamic changes in patient anatomy for clinical assessments and radiotherapy planning.
However, inherent physical and technical constraints of imaging hardware often necessitate a compromise between temporal resolution and image quality.
We propose a novel approach for continuously modeling patient anatomic motion using implicit neural representation.
arXiv Detail & Related papers (2024-05-24T09:35:42Z) - TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models [94.24861019513462]
TRIP is a new recipe of image-to-video diffusion paradigm.
It pivots on image noise prior derived from static image to jointly trigger inter-frame relational reasoning.
Extensive experiments on WebVid-10M, DTDB and MSR-VTT datasets demonstrate TRIP's effectiveness.
arXiv Detail & Related papers (2024-03-25T17:59:40Z) - Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique.
We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z) - Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation [5.78796187123888]
We introduce a method to generate temporally coherent human animation from a single image, a video, or a random noise.
We claim that bidirectional temporal modeling enforces temporal coherence on a generative network by largely suppressing the motion ambiguity of human appearance.
arXiv Detail & Related papers (2023-07-02T13:57:45Z) - Conditional Image-to-Video Generation with Latent Flow Diffusion Models [18.13991670747915]
Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image and a condition.
We propose an approach for cI2V using novel latent flow diffusion models (LFDM)
LFDM synthesizes an optical flow sequence in the latent space based on the given condition to warp the given image.
arXiv Detail & Related papers (2023-03-24T01:54:26Z) - Modelling Latent Dynamics of StyleGAN using Neural ODEs [52.03496093312985]
We learn the trajectory of independently inverted latent codes from GANs.
The learned continuous trajectory allows us to perform infinite frame and consistent video manipulation.
Our method achieves state-of-the-art performance but with much less computation.
arXiv Detail & Related papers (2022-08-23T21:20:38Z) - Conditional Temporal Variational AutoEncoder for Action Video Prediction [66.63038712306606]
ACT-VAE predicts pose sequences for an action clips from a single input image.
When connected with a plug-and-play Pose-to-Image (P2I) network, ACT-VAE can synthesize image sequences.
arXiv Detail & Related papers (2021-08-12T10:59:23Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - TSI: Temporal Saliency Integration for Video Action Recognition [32.18535820790586]
We propose a Temporal Saliency Integration (TSI) block, which mainly contains a Salient Motion Excitation (SME) module and a Cross-scale Temporal Integration (CTI) module.
SME aims to highlight the motion-sensitive area through local-global motion modeling.
CTI is designed to perform multi-scale temporal modeling through a group of separate 1D convolutions respectively.
arXiv Detail & Related papers (2021-06-02T11:43:49Z) - Learning a Generative Motion Model from Image Sequences based on a
Latent Motion Matrix [8.774604259603302]
We learn a probabilistic motion model from simulating temporal-temporal registration in a sequence of images.
We show improved registration accuracy-temporally smoother consistencys compared to three state-of-the-art registration algorithms.
We also demonstrate the model's applicability for motion analysis, simulation and super-resolution by an improved motion reconstruction from sequences with missing frames.
arXiv Detail & Related papers (2020-11-03T14:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.