Related papers: Video Interpolation with Diffusion Models

Video Interpolation with Diffusion Models

URL: http://arxiv.org/abs/2404.01203v1
Date: Mon, 1 Apr 2024 15:59:32 GMT
Title: Video Interpolation with Diffusion Models
Authors: Siddhant Jain, Daniel Watson, Eric Tabellion, Aleksander Hołyński, Ben Poole, Janne Kontkanen,
Abstract summary: We present VIDIM, a generative model for video, which creates short videos given a start and end frame. VIDIM uses cascaded diffusion models to first generate the target video at low resolution, and then generate the high-resolution video conditioned on the low-resolution generated video.
Score: 54.06746595879689
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present VIDIM, a generative model for video interpolation, which creates short videos given a start and end frame. In order to achieve high fidelity and generate motions unseen in the input data, VIDIM uses cascaded diffusion models to first generate the target video at low resolution, and then generate the high-resolution video conditioned on the low-resolution generated video. We compare VIDIM to previous state-of-the-art methods on video interpolation, and demonstrate how such works fail in most settings where the underlying motion is complex, nonlinear, or ambiguous while VIDIM can easily handle such cases. We additionally demonstrate how classifier-free guidance on the start and end frame and conditioning the super-resolution model on the original high-resolution frames without additional parameters unlocks high-fidelity results. VIDIM is fast to sample from as it jointly denoises all the frames to be generated, requires less than a billion parameters per diffusion model to produce compelling results, and still enjoys scalability and improved quality at larger parameter counts.

Related papers

VDOT: Efficient Unified Video Creation via Optimal Transport Distillation [70.02065520468726]
We propose an efficient unified video creation model, named VDOT.<n>We employ a novel computational optimal transport (OT) technique to optimize the discrepancy between the real and fake score distributions.<n>To support training unified video creation models, we propose a fully automated pipeline for video data annotation and filtering.
arXiv Detail & Related papers (2025-12-07T11:31:00Z)
Video Consistency Distance: Enhancing Temporal Consistency for Image-to-Video Generation via Reward-Based Fine-Tuning [5.847416016271551]
Reward-based fine-tuning of video diffusion models is an effective approach to improve the quality of generated videos.<n>We propose Video Consistency Distance (VCD), a novel metric designed to enhance temporal consistency.
arXiv Detail & Related papers (2025-10-22T02:59:45Z)
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution [62.10676832966289]
Cascaded video super-resolution has emerged as a promising technique for generating high-resolution videos using large foundation models.<n>We present UniMMVSR, the first unified generative video super-resolution framework to incorporate hybrid-modal conditions, including text, images, and videos.<n>Our experiments demonstrate that UniMMVSR significantly outperforms existing methods, producing videos with superior detail and a higher degree of conformity to multi-modal conditions.
arXiv Detail & Related papers (2025-10-09T12:25:16Z)
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning [73.90466023069125]
We propose LOVE-R1, a model that can adaptively zoom in on a video clip.<n>The model is first provided with densely sampled frames but in a small resolution.<n>If some spatial details are needed, the model can zoom in on a clip of interest with a large frame resolution.
arXiv Detail & Related papers (2025-09-29T13:43:55Z)
Adapting Image-to-Video Diffusion Models for Large-Motion Frame Interpolation [0.0]
We present a conditional encoder designed to adapt an image-to-video model for a large-motion frame. To enhance performance, we integrate a dual-branch feature extractor and propose a cross-frame attention mechanism. Our approach demonstrates superior performance on the Fr'teche Video Distance metric when evaluated against other state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-22T14:49:55Z)
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation. We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning. Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z)
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach [29.753974393652356]
We propose a frame-aware video diffusion model(FVDM) Our approach allows each frame to follow an independent noise schedule, enhancing the model's capacity to capture fine-grained temporal dependencies. Our empirical evaluations show that FVDM outperforms state-of-the-art methods in video generation quality, while also excelling in extended tasks.
arXiv Detail & Related papers (2024-10-04T05:47:39Z)
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner. We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules. Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z)
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling. It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences. It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z)
Dynamic Video Frame Interpolation with integrated Difficulty Pre-Assessment [10.248729137820442]
Video frame(VFI) models still struggle to achieve a good trade-off between accuracy and efficiency. We present an integrated pipeline which combines difficulty assessment with video frame dataset. Our proposed pipeline can improve the accuracy-efficiency trade-off for VFI.
arXiv Detail & Related papers (2023-04-25T09:11:20Z)
VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z)
Imagen Video: High Definition Video Generation with Diffusion Models [64.06483414521222]
Imagen Video is a text-conditional video generation system based on a cascade of video diffusion models. We find Imagen Video capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge.
arXiv Detail & Related papers (2022-10-05T14:41:38Z)
Video Diffusion Models [47.99413440461512]
Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We propose a diffusion model for video generation that shows very promising initial results. We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on an established unconditional video generation benchmark.
arXiv Detail & Related papers (2022-04-07T14:08:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.