Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
- URL: http://arxiv.org/abs/2502.00500v2
- Date: Tue, 04 Feb 2025 15:19:12 GMT
- Title: Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
- Authors: Yang Cao, Zhao Song, Chiwun Yang,
- Abstract summary: This paper considers an efficient video modeling process called Video Latent Flow Matching (VLFM)
Our method relies on current strong pre-trained image generation models, modeling a certain caption-guided flow of latent patches that can be decoded to time-dependent video frames.
We conduct experiments on several text-to-video datasets to showcase the effectiveness of our method.
- Score: 11.77588746719272
- License:
- Abstract: This paper considers an efficient video modeling process called Video Latent Flow Matching (VLFM). Unlike prior works, which randomly sampled latent patches for video generation, our method relies on current strong pre-trained image generation models, modeling a certain caption-guided flow of latent patches that can be decoded to time-dependent video frames. We first speculate multiple images of a video are differentiable with respect to time in some latent space. Based on this conjecture, we introduce the HiPPO framework to approximate the optimal projection for polynomials to generate the probability path. Our approach gains the theoretical benefits of the bounded universal approximation error and timescale robustness. Moreover, VLFM processes the interpolation and extrapolation abilities for video generation with arbitrary frame rates. We conduct experiments on several text-to-video datasets to showcase the effectiveness of our method.
Related papers
- Continuous Video Process: Modeling Videos as Continuous Multi-Dimensional Processes for Video Prediction [43.16308241800144]
We introduce a novel model class, that treats video as a continuous multi-dimensional process rather than a series of discrete frames.
We establish state-of-the-art performance in video prediction, validated on benchmark datasets including KTH, BAIR, Human3.6M, and UCF101.
arXiv Detail & Related papers (2024-12-06T10:34:50Z) - Optical-Flow Guided Prompt Optimization for Coherent Video Generation [51.430833518070145]
We propose a framework called MotionPrompt that guides the video generation process via optical flow.
We optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs.
This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content.
arXiv Detail & Related papers (2024-11-23T12:26:52Z) - ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation.
We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning.
Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z) - Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation [60.27691946892796]
We present a method for generating video sequences with coherent motion between a pair of input key frames.
Our experiments show that our method outperforms both existing diffusion-based methods and traditional frame techniques.
arXiv Detail & Related papers (2024-08-27T17:57:14Z) - ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner.
We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules.
Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z) - Flexible Diffusion Modeling of Long Videos [15.220686350342385]
We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset.
We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length.
We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA self-driving car simulator.
arXiv Detail & Related papers (2022-05-23T17:51:48Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - Efficient training for future video generation based on hierarchical
disentangled representation of latent variables [66.94698064734372]
We propose a novel method for generating future prediction videos with less memory usage than the conventional methods.
We achieve high-efficiency by training our method in two stages: (1) image reconstruction to encode video frames into latent variables, and (2) latent variable prediction to generate the future sequence.
Our experiments show that the proposed method can efficiently generate future prediction videos, even for complex datasets that cannot be handled by previous methods.
arXiv Detail & Related papers (2021-06-07T10:43:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.