Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video
Restoration
- URL: http://arxiv.org/abs/2205.10195v1
- Date: Fri, 20 May 2022 14:14:48 GMT
- Title: Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video
Restoration
- Authors: Jing Lin, Xiaowan Hu, Yuanhao Cai, Haoqian Wang, Youliang Yan, Xueyi
Zou, Yulun Zhang, Luc Van Gool
- Abstract summary: How to properly model the inter-frame relation within the video sequence is an important but unsolved challenge for video restoration (VR)
In this work, we propose an unsupervised flow-aligned sequence-to-sequence model (S2SVR) to address this problem.
S2SVR shows superior performance in multiple VR tasks, including video deblurring, video super-resolution, and compressed video quality enhancement.
- Score: 85.3323211054274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to properly model the inter-frame relation within the video sequence is
an important but unsolved challenge for video restoration (VR). In this work,
we propose an unsupervised flow-aligned sequence-to-sequence model (S2SVR) to
address this problem. On the one hand, the sequence-to-sequence model, which
has proven capable of sequence modeling in the field of natural language
processing, is explored for the first time in VR. Optimized serialization
modeling shows potential in capturing long-range dependencies among frames. On
the other hand, we equip the sequence-to-sequence model with an unsupervised
optical flow estimator to maximize its potential. The flow estimator is trained
with our proposed unsupervised distillation loss, which can alleviate the data
discrepancy and inaccurate degraded optical flow issues of previous flow-based
methods. With reliable optical flow, we can establish accurate correspondence
among multiple frames, narrowing the domain difference between 1D language and
2D misaligned frames and improving the potential of the sequence-to-sequence
model. S2SVR shows superior performance in multiple VR tasks, including video
deblurring, video super-resolution, and compressed video quality enhancement.
Code and models are publicly available at
https://github.com/linjing7/VR-Baseline
Related papers
- Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution [82.38677987249348]
We present the Qwen2-VL Series, which redefines the conventional predetermined-resolution approach in visual processing.
Qwen2-VL introduces the Naive Dynamic Resolution mechanism, which enables the model to dynamically process images of varying resolutions into different numbers of visual tokens.
The model also integrates Multimodal Rotary Position Embedding (M-RoPE), facilitating the effective fusion of positional information across text, images, and videos.
arXiv Detail & Related papers (2024-09-18T17:59:32Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised.
We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z) - Diagnosing and Preventing Instabilities in Recurrent Video Processing [23.39527368516591]
We show that video stability models tend to fail catastrophically at inference time on long visualizations.
We introduce a diagnostic tool which produces adversarial input sequences optimized to trigger instabilities.
We then introduce Stable Rank Normalization of the Layers (SRNL), a new algorithm that enforces these constraints.
arXiv Detail & Related papers (2020-10-10T21:39:28Z) - Hybrid-S2S: Video Object Segmentation with Recurrent Networks and
Correspondence Matching [3.9053553775979086]
One-shot Video Object(VOS) is the task of tracking an object of interest within a video sequence.
We study an RNN-based architecture and address some of these issues by proposing a hybrid sequence-to-sequence architecture named HS2S.
Our experiments show that augmenting the RNN with correspondence matching is a highly effective solution to reduce the drift problem.
arXiv Detail & Related papers (2020-10-10T19:00:43Z) - Enhanced Quadratic Video Interpolation [56.54662568085176]
We propose an enhanced quadratic video (EQVI) model to handle more complicated scenes and motion patterns.
To further boost the performance, we devise a novel multi-scale fusion network (MS-Fusion) which can be regarded as a learnable augmentation process.
The proposed EQVI model won the first place in the AIM 2020 Video Temporal Super-Resolution Challenge.
arXiv Detail & Related papers (2020-09-10T02:31:50Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.