Disentangled Motion Modeling for Video Frame Interpolation
- URL: http://arxiv.org/abs/2406.17256v2
- Date: Thu, 19 Dec 2024 02:05:58 GMT
- Title: Disentangled Motion Modeling for Video Frame Interpolation
- Authors: Jaihyun Lew, Jooyoung Choi, Chaehun Shin, Dahuin Jung, Sungroh Yoon,
- Abstract summary: Video Frame Interpolation (VFI) aims to synthesize intermediate frames between existing frames to enhance visual smoothness and quality.
We introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling.
- Score: 40.83962594702387
- License:
- Abstract: Video Frame Interpolation (VFI) aims to synthesize intermediate frames between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works have employed generative models for improved perceptual quality. However, they require complex training and large computational costs for pixel space modeling. In this paper, we introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling. We propose a disentangled two-stage training process. In the initial stage, frame synthesis and flow models are trained to generate accurate frames and flows optimal for synthesis. In the subsequent stage, we introduce a motion diffusion model, which incorporates our novel U-Net architecture specifically designed for optical flow, to generate bi-directional flows between frames. By learning the simpler low-frequency representation of motions, MoMo achieves superior perceptual quality with reduced computational demands compared to the generative modeling methods on the pixel space. MoMo surpasses state-of-the-art methods in perceptual metrics across various benchmarks, demonstrating its efficacy and efficiency in VFI.
Related papers
- VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators.
VideoJAM achieves state-of-the-art performance in motion coherence.
These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z) - Motion-Aware Generative Frame Interpolation [23.380470636851022]
We propose Motion-aware Generative frame (MoG) to enhance the model's motion awareness by integrating explicit motion guidance.
To demonstrate the versatility of our method, we train MoG on both real-world and animation datasets.
arXiv Detail & Related papers (2025-01-07T11:03:43Z) - Generalizable Implicit Motion Modeling for Video Frame Interpolation [51.966062283735596]
Motion is critical in flow-based Video Frame Interpolation (VFI)
We introduce General Implicit Motion Modeling (IMM), a novel and effective approach to motion modeling VFI.
Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion.
arXiv Detail & Related papers (2024-07-11T17:13:15Z) - Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity.
We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff)
Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - Boost Video Frame Interpolation via Motion Adaptation [73.42573856943923]
Video frame (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.
Existing learning-based VFI methods have achieved great success, but they still suffer from limited generalization ability.
We propose a novel optimization-based VFI method that can adapt to unseen motions at test time.
arXiv Detail & Related papers (2023-06-24T10:44:02Z) - IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame
Interpolation with Events [14.098949778274733]
Event cameras are ideal for capturing inter-frame dynamics with their extremely high temporal resolution.
We propose an event-and-frame-based video frame method named IDO-VFI that assigns varying amounts of computation for different sub-regions.
Our proposed method maintains high-quality performance while reducing computation time and computational effort by 10% and 17% respectively on Vimeo90K datasets.
arXiv Detail & Related papers (2023-05-17T13:22:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.