Video Frame Interpolation with Many-to-many Splatting and Spatial
Selective Refinement
- URL: http://arxiv.org/abs/2310.18946v1
- Date: Sun, 29 Oct 2023 09:09:32 GMT
- Title: Video Frame Interpolation with Many-to-many Splatting and Spatial
Selective Refinement
- Authors: Ping Hu, Simon Niklaus, Lu Zhang, Stan Sclaroff, Kate Saenko
- Abstract summary: We propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently.
For each input frame pair, M2M has a minuscule computational overhead when interpolating an arbitrary number of in-between frames.
We extend an M2M++ framework by introducing a flexible Spatial Selective Refinement component, which allows for trading computational efficiency for quality and vice versa.
- Score: 83.60486465697318
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we first propose a fully differentiable Many-to-Many (M2M)
splatting framework to interpolate frames efficiently. Given a frame pair, we
estimate multiple bidirectional flows to directly forward warp the pixels to
the desired time step before fusing overlapping pixels. In doing so, each
source pixel renders multiple target pixels and each target pixel can be
synthesized from a larger area of visual context, establishing a many-to-many
splatting scheme with robustness to undesirable artifacts. For each input frame
pair, M2M has a minuscule computational overhead when interpolating an
arbitrary number of in-between frames, hence achieving fast multi-frame
interpolation. However, directly warping and fusing pixels in the intensity
domain is sensitive to the quality of motion estimation and may suffer from
less effective representation capacity. To improve interpolation accuracy, we
further extend an M2M++ framework by introducing a flexible Spatial Selective
Refinement (SSR) component, which allows for trading computational efficiency
for interpolation quality and vice versa. Instead of refining the entire
interpolated frame, SSR only processes difficult regions selected under the
guidance of an estimated error map, thereby avoiding redundant computation.
Evaluation on multiple benchmark datasets shows that our method is able to
improve the efficiency while maintaining competitive video interpolation
quality, and it can be adjusted to use more or less compute as needed.
Related papers
- ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation.
We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning.
Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z) - Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - Many-to-many Splatting for Efficient Video Frame Interpolation [80.10804399840927]
Motion-based video frame relies on optical flow to warp pixels from inputs to desired instant.
Many-to-Many (M2M) splatting framework to interpolate frames efficiently.
M2M has minuscule computational overhead when interpolating arbitrary number of in-between frames.
arXiv Detail & Related papers (2022-04-07T15:29:42Z) - ALANET: Adaptive Latent Attention Network forJoint Video Deblurring and
Interpolation [38.52446103418748]
We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos.
We employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame.
Our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem.
arXiv Detail & Related papers (2020-08-31T21:11:53Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z) - Softmax Splatting for Video Frame Interpolation [14.815903726643011]
Differentable image sampling has seen broad adoption in tasks like depth estimation and optical flow prediction.
We propose softmax splatting to address this paradigm shift and show its effectiveness on the application of frame geometry.
We show that our synthesis approach, empowered by softmax splatting, achieves new state-of-the-art results for video frame geometry.
arXiv Detail & Related papers (2020-03-11T21:38:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.