Sparse Global Matching for Video Frame Interpolation with Large Motion
- URL: http://arxiv.org/abs/2404.06913v3
- Date: Mon, 19 Aug 2024 11:52:53 GMT
- Title: Sparse Global Matching for Video Frame Interpolation with Large Motion
- Authors: Chunxu Liu, Guozhen Zhang, Rui Zhao, Limin Wang,
- Abstract summary: Large motion poses a critical challenge in Video Frame Interpolation (VFI) task.
Existing methods are often constrained by limited receptive fields, resulting in sub-optimal performance when handling scenarios with large motion.
We introduce a new pipeline for VFI, which can effectively integrate global-level information to alleviate issues associated with large motion.
- Score: 20.49084881829404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large motion poses a critical challenge in Video Frame Interpolation (VFI) task. Existing methods are often constrained by limited receptive fields, resulting in sub-optimal performance when handling scenarios with large motion. In this paper, we introduce a new pipeline for VFI, which can effectively integrate global-level information to alleviate issues associated with large motion. Specifically, we first estimate a pair of initial intermediate flows using a high-resolution feature map for extracting local details. Then, we incorporate a sparse global matching branch to compensate for flow estimation, which consists of identifying flaws in initial flows and generating sparse flow compensation with a global receptive field. Finally, we adaptively merge the initial flow estimation with global flow compensation, yielding a more accurate intermediate flow. To evaluate the effectiveness of our method in handling large motion, we carefully curate a more challenging subset from commonly used benchmarks. Our method demonstrates the state-of-the-art performance on these VFI subsets with large motion.
Related papers
- Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization [28.005080560540133]
Weakly supervised temporal action localization (WS-TAL) is a task of targeting at localizing complete action instances and categorizing them with video-level labels.
Action-background ambiguity, primarily caused by background noise resulting from aggregation and intra-action variation, is a significant challenge for existing WS-TAL methods.
We introduce a hybrid multi-head attention (HMHA) module and generalized uncertainty-based evidential fusion (GUEF) module to address the problem.
arXiv Detail & Related papers (2024-12-27T03:04:57Z) - Learning Normal Flow Directly From Event Neighborhoods [18.765370814655626]
We propose a novel supervised point-based method for normal flow estimation.
Using a local point cloud encoder, our method directly estimates per-event normal flow from raw events.
Our method achieves better and more consistent performance than state-of-the-art methods when transferred across different datasets.
arXiv Detail & Related papers (2024-12-15T19:09:45Z) - GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation [10.48817934871207]
We propose a global motion-guided recurrent flow estimation method called GMFlow for pose estimation.
We leverage the object's structural information to extend the motion of visible parts of the rigid body to its invisible regions.
Our method outperforms existing techniques in accuracy while maintaining competitive computational efficiency.
arXiv Detail & Related papers (2024-11-26T07:28:48Z) - FlowIE: Efficient Image Enhancement via Rectified Flow [71.6345505427213]
FlowIE is a flow-based framework that estimates straight-line paths from an elementary distribution to high-quality images.
Our contributions are rigorously validated through comprehensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-06-01T17:29:29Z) - STARFlow: Spatial Temporal Feature Re-embedding with Attentive Learning for Real-world Scene Flow [5.476991379461233]
We propose global attentive flow embedding to match all-to-all point pairs in both Euclidean space.
We leverage novel domain adaptive losses to bridge the gap of motion inference from synthetic to real-world.
Our approach achieves state-of-the-art performance across various datasets, with particularly outstanding results on real-world LiDAR-scanned datasets.
arXiv Detail & Related papers (2024-03-11T04:56:10Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - Diffusion Generative Flow Samplers: Improving learning signals through
partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments.
Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z) - AccFlow: Backward Accumulation for Long-Range Optical Flow [70.4251045372285]
This paper proposes a novel recurrent framework called AccFlow for long-range optical flow estimation.
We demonstrate the superiority of backward accumulation over conventional forward accumulation.
Experiments validate the effectiveness of AccFlow in handling long-range optical flow estimation.
arXiv Detail & Related papers (2023-08-25T01:51:26Z) - GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation.
It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.
Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z) - FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation [87.74617110803189]
Estimating the 3D motion of points in a scene, known as scene flow, is a core problem in computer vision.
We present a recurrent architecture that learns a single step of an unrolled iterative alignment procedure for refining scene flow predictions.
arXiv Detail & Related papers (2020-11-19T23:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.