The Invertible U-Net for Optical-Flow-free Video Interframe Generation
- URL: http://arxiv.org/abs/2103.09576v1
- Date: Wed, 17 Mar 2021 11:37:10 GMT
- Title: The Invertible U-Net for Optical-Flow-free Video Interframe Generation
- Authors: Saem Park, Donghun Han and Nojun Kwak
- Abstract summary: In this paper, we try to tackle the video interframe generation problem without using problematic optical flow.
We propose a learning method with a new consistency loss in the latent space to maintain semantic temporal consistency between frames.
The resolution of the generated image is guaranteed to be identical to that of the original images by using an invertible network.
- Score: 31.100044730381047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video frame interpolation is the task of creating an interface between two
adjacent frames along the time axis. So, instead of simply averaging two
adjacent frames to create an intermediate image, this operation should maintain
semantic continuity with the adjacent frames. Most conventional methods use
optical flow, and various tools such as occlusion handling and object smoothing
are indispensable. Since the use of these various tools leads to complex
problems, we tried to tackle the video interframe generation problem without
using problematic optical flow. To enable this, we have tried to use a deep
neural network with an invertible structure and developed an invertible U-Net
which is a modified normalizing flow. In addition, we propose a learning method
with a new consistency loss in the latent space to maintain semantic temporal
consistency between frames. The resolution of the generated image is guaranteed
to be identical to that of the original images by using an invertible network.
Furthermore, as it is not a random image like the ones by generative models,
our network guarantees stable outputs without flicker. Through experiments, we
confirmed the feasibility of the proposed algorithm and would like to suggest
invertible U-Net as a new possibility for baseline in video frame
interpolation. This paper is meaningful in that it is the worlds first attempt
to use invertible networks instead of optical flows for video interpolation.
Related papers
- ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation.
We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning.
Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z) - Aggregating Long-term Sharp Features via Hybrid Transformers for Video
Deblurring [76.54162653678871]
We propose a video deblurring method that leverages both neighboring frames and present sharp frames using hybrid Transformers for feature aggregation.
Our proposed method outperforms state-of-the-art video deblurring methods as well as event-driven video deblurring methods in terms of quantitative metrics and visual quality.
arXiv Detail & Related papers (2023-09-13T16:12:11Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - Latent-Shift: Latent Diffusion with Temporal Shift for Efficient
Text-to-Video Generation [115.09597127418452]
Latent-Shift is an efficient text-to-video generation method based on a pretrained text-to-image generation model.
We show that Latent-Shift achieves comparable or better results while being significantly more efficient.
arXiv Detail & Related papers (2023-04-17T17:57:06Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - Cross-Attention Transformer for Video Interpolation [3.5317804902980527]
TAIN (Transformers and Attention for video INterpolation) aims to interpolate an intermediate frame given two consecutive image frames around it.
We first present a novel visual transformer module, named Cross-Similarity (CS), to globally aggregate input image features with similar appearance as those of the predicted frame.
To account for occlusions in the CS features, we propose an Image Attention (IA) module to allow the network to focus on CS features from one frame over those of the other.
arXiv Detail & Related papers (2022-07-08T21:38:54Z) - Restoration of Video Frames from a Single Blurred Image with Motion
Understanding [69.90724075337194]
We propose a novel framework to generate clean video frames from a single motion-red image.
We formulate video restoration from a single blurred image as an inverse problem by setting clean image sequence and their respective motion as latent factors.
Our framework is based on anblur-decoder structure with spatial transformer network modules.
arXiv Detail & Related papers (2021-04-19T08:32:57Z) - W-Cell-Net: Multi-frame Interpolation of Cellular Microscopy Videos [1.7205106391379026]
We apply recent advances in Deep video convolution to increase the temporal resolution of fluorescent microscopy time-lapse movies.
To our knowledge, there is no previous work that uses Conal Neural Networks (CNN) to generate frames between two consecutive microscopy images.
arXiv Detail & Related papers (2020-05-14T01:33:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.