Video Frame Interpolation with Region-Distinguishable Priors from SAM
- URL: http://arxiv.org/abs/2312.15868v1
- Date: Tue, 26 Dec 2023 03:27:30 GMT
- Title: Video Frame Interpolation with Region-Distinguishable Priors from SAM
- Authors: Yan Han and Xiaogang Xu and Yingqi Lin and Jiafei Wu and Zhe Liu
- Abstract summary: Region-Distinguishable Priors (RDPs) are represented as spatial-varying Gaussian mixtures.
Hierarchical Region-aware Feature Fusion Module (HRFFM) incorporates into various hierarchical stages of VFI's encoder.
experiments demonstrate that HRFFM consistently enhances VFI performance across various scenes.
- Score: 19.350313166180747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In existing Video Frame Interpolation (VFI) approaches, the motion estimation
between neighboring frames plays a crucial role. However, the estimation
accuracy in existing methods remains a challenge, primarily due to the inherent
ambiguity in identifying corresponding areas in adjacent frames for
interpolation. Therefore, enhancing accuracy by distinguishing different
regions before motion estimation is of utmost importance. In this paper, we
introduce a novel solution involving the utilization of open-world segmentation
models, e.g., SAM (Segment Anything Model), to derive Region-Distinguishable
Priors (RDPs) in different frames. These RDPs are represented as
spatial-varying Gaussian mixtures, distinguishing an arbitrary number of areas
with a unified modality. RDPs can be integrated into existing motion-based VFI
methods to enhance features for motion estimation, facilitated by our designed
play-and-plug Hierarchical Region-aware Feature Fusion Module (HRFFM). HRFFM
incorporates RDP into various hierarchical stages of VFI's encoder, using
RDP-guided Feature Normalization (RDPFN) in a residual learning manner. With
HRFFM and RDP, the features within VFI's encoder exhibit similar
representations for matched regions in neighboring frames, thus improving the
synthesis of intermediate frames. Extensive experiments demonstrate that HRFFM
consistently enhances VFI performance across various scenes.
Related papers
- From Modalities to Styles: Rethinking the Domain Gap in Heterogeneous Face Recognition [4.910937238451485]
We present a new Conditional Adaptive Instance Modulation (CAIM) module that seamlessly fits into existing Face Recognition networks.
The CAIM block modulates intermediate feature maps, efficiently adapting to the style of the source modality and bridging the domain gap.
We extensively evaluate the proposed approach on various challenging HFR benchmarks, showing that it outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-04-22T15:00:51Z) - Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity.
We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff)
Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Error-Aware Spatial Ensembles for Video Frame Interpolation [50.63021118973639]
Video frame(VFI) algorithms have improved considerably in recent years due to unprecedented progress in both data-driven algorithms and their implementations.
Recent research has introduced advanced motion estimation or novel warping methods as the means to address challenging VFI scenarios.
This work introduces such a solution. By closely examining the correlation between optical flow and IE, the paper proposes novel error prediction metrics that partition the middle frame into distinct regions corresponding to different IE levels.
arXiv Detail & Related papers (2022-07-25T16:15:38Z) - DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with
Flow-Guided Attentive Correlation and Recursive Boosting [50.17500790309477]
DeMFI-Net is a joint deblurring and multi-frame framework.
It converts blurry videos of lower-frame-rate to sharp videos at higher-frame-rate.
It achieves state-of-the-art (SOTA) performances for diverse datasets.
arXiv Detail & Related papers (2021-11-19T00:00:15Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.