FG-DFPN: Flow Guided Deformable Frame Prediction Network
- URL: http://arxiv.org/abs/2503.11343v1
- Date: Fri, 14 Mar 2025 12:18:33 GMT
- Title: FG-DFPN: Flow Guided Deformable Frame Prediction Network
- Authors: M. Akın Yılmaz, Ahmet Bilican, A. Murat Tekalp,
- Abstract summary: We present FG-DFPN, a novel architecture that harnesses the synergy between optical flow estimation and deformable convolutions to model complex dynamics.<n>Our experiments demonstrate that FG-DFPN achieves state-of-the-art performance on eight diverse MPEG test sequences, outperforming existing methods by 1 PSNR while maintaining competitive inference speeds.
- Score: 5.6390038395163815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video frame prediction remains a fundamental challenge in computer vision with direct implications for autonomous systems, video compression, and media synthesis. We present FG-DFPN, a novel architecture that harnesses the synergy between optical flow estimation and deformable convolutions to model complex spatio-temporal dynamics. By guiding deformable sampling with motion cues, our approach addresses the limitations of fixed-kernel networks when handling diverse motion patterns. The multi-scale design enables FG-DFPN to simultaneously capture global scene transformations and local object movements with remarkable precision. Our experiments demonstrate that FG-DFPN achieves state-of-the-art performance on eight diverse MPEG test sequences, outperforming existing methods by 1dB PSNR while maintaining competitive inference speeds. The integration of motion cues with adaptive geometric transformations makes FG-DFPN a promising solution for next-generation video processing systems that require high-fidelity temporal predictions. The model and instructions to reproduce our results will be released at: https://github.com/KUIS-AI-Tekalp-Research Group/frame-prediction
Related papers
- Towards Efficient Real-Time Video Motion Transfer via Generative Time Series Modeling [7.3949576464066]
We propose a deep learning framework designed to significantly optimize bandwidth for motion-transfer-enabled video applications.
To capture complex motion effectively, we utilize the First Order Motion Model (FOMM), which encodes dynamic objects by detecting keypoints.
We validate our results across three datasets for video animation and reconstruction using the following metrics: Mean Absolute Error, Joint Embedding Predictive Architecture Embedding Distance, Structural Similarity Index, and Average Pair-wise Displacement.
arXiv Detail & Related papers (2025-04-07T22:21:54Z) - Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields [39.214857326425204]
Video Frame Interpolation (VFI) aims to generate intermediate video frames between consecutive input frames.
We propose a novel event-based VFI framework with cross-modal asymmetric bidirectional motion field estimation.
Our method shows significant performance improvement over the state-of-the-art VFI methods on various datasets.
arXiv Detail & Related papers (2025-02-19T13:40:43Z) - MAUCell: An Adaptive Multi-Attention Framework for Video Frame Prediction [0.0]
We introduce the Multi-Attention Unit (MAUCell) which combines Generative Adrative Networks (GANs) and attention mechanisms to improve video prediction.<n>The new design system maintains equilibrium between temporal continuity and spatial accuracy to deliver reliable video prediction.
arXiv Detail & Related papers (2025-01-28T14:52:10Z) - Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions [63.23985601478339]
We propose a simple yet effective solution, H-VFI, to deal with large motions in video frame.
H-VFI contributes a hierarchical video transformer to learn a deformable kernel in a coarse-to-fine strategy.
The advantage of such a progressive approximation is that the large motion frame problem can be predicted into several relatively simpler sub-tasks.
arXiv Detail & Related papers (2022-11-21T09:49:23Z) - JNMR: Joint Non-linear Motion Regression for Video Frame Interpolation [47.123769305867775]
Video frame (VFI) aims to generate frames by warping learnable motions from the bidirectional historical references.
We reformulate VFI as a Joint Non-linear Motion Regression (JNMR) strategy to model the complicated motions of inter-frame.
We show that the effectiveness and significant improvement of joint motion regression compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T02:47:29Z) - Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing [14.67994875448175]
Video snapshot imaging (SCI) utilizes a 2D detector to capture sequential video frames and compress them into a single measurement.
Most existing reconstruction methods are incapable of efficiently capturing long-range spatial and temporal dependencies.
We propose a flexible and robust approach based on the graph neural network (GNN) to efficiently model non-local interactions between pixels in space and time regardless of the distance.
arXiv Detail & Related papers (2022-03-01T12:13:46Z) - Flow-Guided Sparse Transformer for Video Deblurring [124.11022871999423]
FlowGuided Sparse Transformer (F GST) is a framework for video deblurring.
FGSW-MSA enjoys the guidance of the estimated optical flow to globally sample spatially sparse elements corresponding to the same scene patch in neighboring frames.
Our proposed F GST outperforms state-of-the-art patches on both DVD and GOPRO datasets and even yields more visually pleasing results in real video deblurring.
arXiv Detail & Related papers (2022-01-06T02:05:32Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks.
We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames.
We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.