Related papers: SG-RIFE: Semantic-Guided Real-Time Intermediate Flow Estimation with Diffusion-Competitive Perceptual Quality

SG-RIFE: Semantic-Guided Real-Time Intermediate Flow Estimation with Diffusion-Competitive Perceptual Quality

URL: http://arxiv.org/abs/2512.18241v1
Date: Sat, 20 Dec 2025 06:50:55 GMT
Title: SG-RIFE: Semantic-Guided Real-Time Intermediate Flow Estimation with Diffusion-Competitive Perceptual Quality
Authors: Pan Ben Wong, Chengli Wu, Hanyue Lu,
Abstract summary: Real-time Video Frame Interpolation (VFI) has long been dominated by flow-based methods like RIFE.<n>Recent diffusion-based approaches achieve state-of-the-art perceptual quality but suffer from prohibitive latency, rendering them impractical for real-time applications.<n>We propose Semantic-Guided RIFE (SG-RIFE), which augments a pre-trained RIFE backbone with semantic priors from a frozen DINOv3 Vision Transformer.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-time Video Frame Interpolation (VFI) has long been dominated by flow-based methods like RIFE, which offer high throughput but often fail in complicated scenarios involving large motion and occlusion. Conversely, recent diffusion-based approaches (e.g., Consec. BB) achieve state-of-the-art perceptual quality but suffer from prohibitive latency, rendering them impractical for real-time applications. To bridge this gap, we propose Semantic-Guided RIFE (SG-RIFE). Instead of training from scratch, we introduce a parameter-efficient fine-tuning strategy that augments a pre-trained RIFE backbone with semantic priors from a frozen DINOv3 Vision Transformer. We propose a Split-Fidelity Aware Projection Module (Split-FAPM) to compress and refine high-dimensional features, and a Deformable Semantic Fusion (DSF) module to align these semantic priors with pixel-level motion fields. Experiments on SNU-FILM demonstrate that semantic injection provides a decisive boost in perceptual fidelity. SG-RIFE outperforms diffusion-based LDMVFI in FID/LPIPS and achieves quality comparable to Consec. BB on complex benchmarks while running significantly faster, proving that semantic consistency enables flow-based methods to achieve diffusion-competitive perceptual quality in near real-time.

Related papers

Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers [95.68243351895107]
We propose a holistic, video-centric paradigm named textbfLocal textbfDiffusion textbfForcing for textbfVideo textbfFrame textbfInterpolation (LDF-VFI)<n>Our framework is built upon an auto-regressive diffusion transformer that models the entire video sequence to ensure long-range temporal coherence.<n>LDF-VFI achieves state-of-the-art performance on challenging long-sequence benchmarks, demonstrating superior per
arXiv Detail & Related papers (2026-01-21T12:58:52Z)
Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment [92.57576987521107]
We propose a novel unifiedtransform framework with dual-domain progressive temporal alignment and quality-conditioned mixture-of-expert (QCMoE)<n>QCMoE allows continuous and consistent rate control with appealing R-D performance.<n> Experimental results show that the proposed method achieves competitive R-D performance compared with the state-of-the-arts.
arXiv Detail & Related papers (2025-12-11T09:14:51Z)
Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm [17.63632082331749]
Large-scale visual foundation models (VFMs) exhibit strong generalization across diverse visual domains, but their potential for single-frame infrared small target (SIRST) detection remains largely unexplored.<n>We propose a Foundation-Driven Efficient Paradigm (FDEP) which can seamlessly adapt to existing encoder-decoder-based methods and significantly improve accuracy without additional inference overhead.
arXiv Detail & Related papers (2025-12-05T08:12:35Z)
Real-Time Inference for Distributed Multimodal Systems under Communication Delay Uncertainty [37.15356899831919]
Connected cyber-physical systems perform inference based on real-time inputs from multiple data streams.<n>We propose a novel neuro-inspired non-blocking inference paradigm that employs adaptive temporal windows of integration.<n>Our framework achieves robust real-time inference with finer-grained control over the accuracy-latency tradeoff.
arXiv Detail & Related papers (2025-11-20T10:48:54Z)
Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones.<n>Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency.<n>We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z)
Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff) Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z)
Long-term Video Frame Interpolation via Feature Propagation [95.18170372022703]
Video frame (VFI) works generally predict intermediate frame(s) by first estimating the motion between inputs and then warping the inputs to the target time with the estimated motion. This approach is not optimal when the temporal distance between the input sequence increases. We propose a propagation network (PNet) by extending the classic feature-level forecasting with a novel motion-to-feature approach.
arXiv Detail & Related papers (2022-03-29T10:47:06Z)
Real-Time Intermediate Flow Estimation for Video Frame Interpolation [50.12253023531497]
RIFE is a Real-time Intermediate Flow Estimation for VFI. A privileged distillation scheme is designed for stable IFNet training. RIFE achieves state-of-the-art performance on several public benchmarks.
arXiv Detail & Related papers (2020-11-12T10:12:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.