Related papers: AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation

AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation

URL: http://arxiv.org/abs/2506.01061v1
Date: Sun, 01 Jun 2025 16:01:24 GMT
Title: AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation
Authors: Dahyeon Kye, Changhyun Roh, Sukhun Ko, Chanho Eom, Jihyong Oh,
Abstract summary: Video Frame Interpolation (VFI) is a fundamental Low-Level Vision (LLV) task that synthesizes intermediate frames between existing ones.<n>We introduce AceVFI, the most comprehensive survey on VFI to date, covering over 250+ papers across these approaches.<n>We categorize the learning paradigm of VFI methods namely, Center-Time Frame Interpolation (CTFI) and Arbitrary-Time Frame Interpolation (ATFI)
Score: 8.563354084119062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video Frame Interpolation (VFI) is a fundamental Low-Level Vision (LLV) task that synthesizes intermediate frames between existing ones while maintaining spatial and temporal coherence. VFI techniques have evolved from classical motion compensation-based approach to deep learning-based approach, including kernel-, flow-, hybrid-, phase-, GAN-, Transformer-, Mamba-, and more recently diffusion model-based approach. We introduce AceVFI, the most comprehensive survey on VFI to date, covering over 250+ papers across these approaches. We systematically organize and describe VFI methodologies, detailing the core principles, design assumptions, and technical characteristics of each approach. We categorize the learning paradigm of VFI methods namely, Center-Time Frame Interpolation (CTFI) and Arbitrary-Time Frame Interpolation (ATFI). We analyze key challenges of VFI such as large motion, occlusion, lighting variation, and non-linear motion. In addition, we review standard datasets, loss functions, evaluation metrics. We examine applications of VFI including event-based, cartoon, medical image VFI and joint VFI with other LLV tasks. We conclude by outlining promising future research directions to support continued progress in the field. This survey aims to serve as a unified reference for both newcomers and experts seeking a deep understanding of modern VFI landscapes.

Related papers

Modeling Cross-vision Synergy for Unified Large Vision Model [130.37489011094036]
PolyV is a unified large vision model that achieves cross-vision synergy at both the architectural and training levels.<n>PolyV consistently outperforms existing models, achieving over 10% average improvement over its backbone.
arXiv Detail & Related papers (2026-03-03T22:44:43Z)
FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM [50.9765003472032]
FoundationSLAM is a learning-based monocular dense SLAM system for accurate and robust tracking and mapping.<n>Our core idea is to bridge flow estimation with reasoning by leveraging the guidance from foundation depth models.
arXiv Detail & Related papers (2025-12-31T17:57:45Z)
Prompt-based Adaptation in Large-scale Vision Models: A Survey [62.09307869247613]
Visual Prompting (VP) and Visual Prompt Tuning (VPT) have emerged as lightweight alternatives to full fine-tuning for adapting large-scale vision models.<n>We provide a taxonomy that categorizes existing methods into learnable, generative, and non-learnable prompts.<n>We examine PA's integrations across diverse domains, including medical imaging, 3D point clouds, and vision-language tasks.
arXiv Detail & Related papers (2025-10-15T07:14:50Z)
SSVIF: Self-Supervised Segmentation-Oriented Visible and Infrared Image Fusion [8.61849023109742]
We propose a self-supervised training framework for segmentation-oriented VIF methods (SSVIF)<n>We introduce a novel self-supervised task-cross-segmentation consistency that enables the fusion model to learn high-level semantic features without the supervision of segmentation labels.<n>Our proposed SSVIF outperforms traditional VIF methods and rivals supervised segmentation-oriented ones.
arXiv Detail & Related papers (2025-09-26T15:05:33Z)
Pure Vision Language Action (VLA) Models: A Comprehensive Survey [16.014856048038272]
The emergence of Vision Language Action (VLA) models marks a paradigm shift from traditional policy-based control to generalized robotics.<n>This survey delves into advanced VLA methods, aiming to provide a clear taxonomy and a systematic, comprehensive review of existing research.
arXiv Detail & Related papers (2025-09-23T13:53:52Z)
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting [70.83781268763215]
Vision-language models (VLMs) have achieved impressive performance across diverse multimodal tasks by leveraging large-scale pre-training.<n>VLMs face unique challenges such as cross-modal feature drift, parameter interference due to shared architectures, and zero-shot capability erosion.<n>This survey aims to serve as a comprehensive and diagnostic reference for researchers developing lifelong vision-language systems.
arXiv Detail & Related papers (2025-08-06T09:03:10Z)
Set Pivot Learning: Redefining Generalized Segmentation with Vision Foundation Models [15.321114178936554]
We introduce the concept of Set Pivot Learning, a paradigm shift that redefines domain generalization (DG) based on Vision Foundation Models (VFMs)<n>Traditional DG assumes that the target domain is inaccessible during training, but the emergence of VFMs renders this assumption unclear and obsolete.<n>We propose Set Pivot Learning (SPL), a new definition of domain migration task based on VFMs, which is more suitable for current research and application requirements.
arXiv Detail & Related papers (2025-08-03T04:20:35Z)
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects [53.15503034595476]
Video Scene Parsing (VSP) has emerged as a cornerstone in computer vision.<n>VSP has emerged as a cornerstone in computer vision, facilitating the simultaneous segmentation, recognition, and tracking of diverse visual entities in dynamic scenes.
arXiv Detail & Related papers (2025-06-16T14:39:03Z)
FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models [15.102237976107645]
Vision-Language Models (VLMs) integrate visual and textual information.<n>Recent efforts have introduced Federated Learning (FL) into VLM fine-tuning to address privacy concerns.<n>We present FedVLMBench, the first systematic benchmark for federated fine-tuning ofVLMs.
arXiv Detail & Related papers (2025-06-11T11:52:27Z)
Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey [67.48187503803847]
Vertical Federated Learning (VFL) is a privacy-preserving distributed learning paradigm. Recent research has shown promising results addressing various challenges in VFL. This survey offers a systematic overview of recent developments.
arXiv Detail & Related papers (2024-05-25T16:05:06Z)
Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff) Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z)
A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow [14.877766449009119]
deep learning-based video frame (VFI) methods have predominantly focused on estimating motion between two input frames. We propose a multi-in-single-out (MISO) based VFI method that does not rely on motion vector estimation. We introduce a novel motion perceptual loss that enables MISO-VFI to better capture the vectors-temporal within the video frames.
arXiv Detail & Related papers (2023-11-20T08:29:55Z)
Boost Video Frame Interpolation via Motion Adaptation [73.42573856943923]
Video frame (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video. Existing learning-based VFI methods have achieved great success, but they still suffer from limited generalization ability. We propose a novel optimization-based VFI method that can adapt to unseen motions at test time.
arXiv Detail & Related papers (2023-06-24T10:44:02Z)
LDMVFI: Video Frame Interpolation with Latent Diffusion Models [3.884484241124158]
We propose latent diffusion model-based VFI, LDMVFI. This approaches the VFI problem from a generative perspective by formulating it as a conditional generation problem. Our experiments and user study indicate that LDMVFI is able to interpolate video content with favorable perceptual quality compared to the state of the art, even in the high-resolution regime.
arXiv Detail & Related papers (2023-03-16T17:24:41Z)
Error-Aware Spatial Ensembles for Video Frame Interpolation [50.63021118973639]
Video frame(VFI) algorithms have improved considerably in recent years due to unprecedented progress in both data-driven algorithms and their implementations. Recent research has introduced advanced motion estimation or novel warping methods as the means to address challenging VFI scenarios. This work introduces such a solution. By closely examining the correlation between optical flow and IE, the paper proposes novel error prediction metrics that partition the middle frame into distinct regions corresponding to different IE levels.
arXiv Detail & Related papers (2022-07-25T16:15:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.