Task Agnostic Restoration of Natural Video Dynamics
- URL: http://arxiv.org/abs/2206.03753v2
- Date: Sat, 19 Aug 2023 04:51:11 GMT
- Title: Task Agnostic Restoration of Natural Video Dynamics
- Authors: Muhammad Kashif Ali, Dongjin Kim, Tae Hyun Kim
- Abstract summary: In many video restoration/translation tasks, image processing operations are na"ively extended to the video domain by processing each frame independently.
We propose a general framework for this task that learns to infer and utilize consistent motion dynamics from inconsistent videos to mitigate the temporal flicker.
The proposed framework produces SOTA results on two benchmark datasets, DAVIS and videvo.net, processed by numerous image processing applications.
- Score: 10.078712109708592
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In many video restoration/translation tasks, image processing operations are
na\"ively extended to the video domain by processing each frame independently,
disregarding the temporal connection of the video frames. This disregard for
the temporal connection often leads to severe temporal inconsistencies.
State-Of-The-Art (SOTA) techniques that address these inconsistencies rely on
the availability of unprocessed videos to implicitly siphon and utilize
consistent video dynamics to restore the temporal consistency of frame-wise
processed videos which often jeopardizes the translation effect. We propose a
general framework for this task that learns to infer and utilize consistent
motion dynamics from inconsistent videos to mitigate the temporal flicker while
preserving the perceptual quality for both the temporally neighboring and
relatively distant frames without requiring the raw videos at test time. The
proposed framework produces SOTA results on two benchmark datasets, DAVIS and
videvo.net, processed by numerous image processing applications. The code and
the trained models are available at
\url{https://github.com/MKashifAli/TARONVD}.
Related papers
- VidToMe: Video Token Merging for Zero-Shot Video Editing [100.79999871424931]
We propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames.
Our method improves temporal coherence and reduces memory consumption in self-attention computations.
arXiv Detail & Related papers (2023-12-17T09:05:56Z) - Blurry Video Compression: A Trade-off between Visual Enhancement and
Data Compression [65.8148169700705]
Existing video compression (VC) methods primarily aim to reduce the spatial and temporal redundancies between consecutive frames in a video.
Previous works have achieved remarkable results on videos acquired under specific settings such as instant (known) exposure time and shutter speed.
In this work, we tackle the VC problem in a general scenario where a given video can be blurry due to predefined camera settings or dynamics in the scene.
arXiv Detail & Related papers (2023-11-08T02:17:54Z) - LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video
Translation [21.815083817914843]
We propose a new zero-shot video-to-video translation framework, named textitLatentWarp.
Our approach is simple: to constrain the query tokens to be temporally consistent, we further incorporate a warping operation in the latent space.
Experiment results demonstrate the superiority of textitLatentWarp in achieving video-to-video translation with temporal coherence.
arXiv Detail & Related papers (2023-11-01T08:02:57Z) - Sync-NeRF: Generalizing Dynamic NeRFs to Unsynchronized Videos [9.90835990611019]
We introduce time offsets for individual unsynchronized videos and jointly optimize the offsets with NeRF.
Finding the offsets naturally works as synchronizing the videos without manual effort.
arXiv Detail & Related papers (2023-10-20T08:45:30Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames.
We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z) - Video Frame Interpolation without Temporal Priors [91.04877640089053]
Video frame aims to synthesize non-exist intermediate frames in a video sequence.
The temporal priors of videos, i.e. frames per second (FPS) and frame exposure time, may vary from different camera sensors.
We devise a novel optical flow refinement strategy for better synthesizing results.
arXiv Detail & Related papers (2021-12-02T12:13:56Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.