Related papers: Boost Video Frame Interpolation via Motion Adaptation

Boost Video Frame Interpolation via Motion Adaptation

URL: http://arxiv.org/abs/2306.13933v3
Date: Thu, 5 Oct 2023 16:25:41 GMT
Title: Boost Video Frame Interpolation via Motion Adaptation
Authors: Haoning Wu, Xiaoyun Zhang, Weidi Xie, Ya Zhang, Yanfeng Wang
Abstract summary: Video frame (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video. Existing learning-based VFI methods have achieved great success, but they still suffer from limited generalization ability. We propose a novel optimization-based VFI method that can adapt to unseen motions at test time.
Score: 73.42573856943923
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video. Existing learning-based VFI methods have achieved great success, but they still suffer from limited generalization ability due to the limited motion distribution of training datasets. In this paper, we propose a novel optimization-based VFI method that can adapt to unseen motions at test time. Our method is based on a cycle-consistency adaptation strategy that leverages the motion characteristics among video frames. We also introduce a lightweight adapter that can be inserted into the motion estimation module of existing pre-trained VFI models to improve the efficiency of adaptation. Extensive experiments on various benchmarks demonstrate that our method can boost the performance of two-frame VFI models, outperforming the existing state-of-the-art methods, even those that use extra input.

Related papers

EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models [73.96414072072048]
Existing motion transfer methods explored the motion representations of reference videos to guide generation. We propose EfficientMT, a novel and efficient end-to-end framework for video motion transfer. Our experiments demonstrate that our EfficientMT outperforms existing methods in efficiency while maintaining flexible motion controllability.
arXiv Detail & Related papers (2025-03-25T05:51:14Z)
Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields [39.214857326425204]
Video Frame Interpolation (VFI) aims to generate intermediate video frames between consecutive input frames. We propose a novel event-based VFI framework with cross-modal asymmetric bidirectional motion field estimation. Our method shows significant performance improvement over the state-of-the-art VFI methods on various datasets.
arXiv Detail & Related papers (2025-02-19T13:40:43Z)
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models [89.79067761383855]
Vchitect-2.0 is a parallel transformer architecture designed to scale up video diffusion models for large-scale text-to-video generation. By introducing a novel Multimodal Diffusion Block, our approach achieves consistent alignment between text descriptions and generated video frames. To overcome memory and computational bottlenecks, we propose a Memory-efficient Training framework.
arXiv Detail & Related papers (2025-01-14T21:53:11Z)
BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions [28.455538651097562]
Existing Video Frame (VFI) models tend to suffer from time-to-location ambiguity when trained with video of non-uniform motions. We propose Bidirectional Motion field (BiM) to effectively describe non-uniform motions. BiM-VFI model significantly surpasses the recent state-of-the-art VFI methods by 26% and 45% improvements in LPIPS and STLPIPS respectively.
arXiv Detail & Related papers (2024-12-16T01:37:51Z)
Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation [20.689304579898728]
Event-based Video Frame Interpolation (EVFI) uses sparse, high-temporal-resolution event measurements as motion guidance. We adapt pre-trained video diffusion models trained on internet-scale datasets to EVFI. Our method outperforms existing methods and generalizes across cameras far better than existing approaches.
arXiv Detail & Related papers (2024-12-10T18:55:30Z)
Generalizable Implicit Motion Modeling for Video Frame Interpolation [51.966062283735596]
Motion is critical in flow-based Video Frame Interpolation (VFI) We introduce General Implicit Motion Modeling (IMM), a novel and effective approach to motion modeling VFI. Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion.
arXiv Detail & Related papers (2024-07-11T17:13:15Z)
Disentangled Motion Modeling for Video Frame Interpolation [40.83962594702387]
Video frame (VFI) aims to synthesize intermediate frames in between existing frames to enhance visual smoothness and quality. We introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling.
arXiv Detail & Related papers (2024-06-25T03:50:20Z)
Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff) Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z)
Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames. It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z)
A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow [14.877766449009119]
deep learning-based video frame (VFI) methods have predominantly focused on estimating motion between two input frames. We propose a multi-in-single-out (MISO) based VFI method that does not rely on motion vector estimation. We introduce a novel motion perceptual loss that enables MISO-VFI to better capture the vectors-temporal within the video frames.
arXiv Detail & Related papers (2023-11-20T08:29:55Z)
EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z)
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised. We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.