Video Rescaling Networks with Joint Optimization Strategies for
Downscaling and Upscaling
- URL: http://arxiv.org/abs/2103.14858v1
- Date: Sat, 27 Mar 2021 09:35:38 GMT
- Title: Video Rescaling Networks with Joint Optimization Strategies for
Downscaling and Upscaling
- Authors: Yan-Cheng Huang, Yi-Hsin Chen, Cheng-You Lu, Hui-Po Wang, Wen-Hsiao
Peng and Ching-Chun Huang
- Abstract summary: We present two joint optimization approaches based on invertible neural networks with coupling layers.
Our Long Short-Term Memory Video Rescaling Network (LSTM-VRN) leverages temporal information in the low-resolution video to form an explicit prediction of the missing high-frequency information for upscaling.
Our Multi-input Multi-output Video Rescaling Network (MIMO-VRN) proposes a new strategy for downscaling and upscaling a group of video frames simultaneously.
- Score: 15.630742638440998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the video rescaling task, which arises from the needs of
adapting the video spatial resolution to suit individual viewing devices. We
aim to jointly optimize video downscaling and upscaling as a combined task.
Most recent studies focus on image-based solutions, which do not consider
temporal information. We present two joint optimization approaches based on
invertible neural networks with coupling layers. Our Long Short-Term Memory
Video Rescaling Network (LSTM-VRN) leverages temporal information in the
low-resolution video to form an explicit prediction of the missing
high-frequency information for upscaling. Our Multi-input Multi-output Video
Rescaling Network (MIMO-VRN) proposes a new strategy for downscaling and
upscaling a group of video frames simultaneously. Not only do they outperform
the image-based invertible model in terms of quantitative and qualitative
results, but also show much improved upscaling quality than the video rescaling
methods without joint optimization. To our best knowledge, this work is the
first attempt at the joint optimization of video downscaling and upscaling.
Related papers
- Differentiable Resolution Compression and Alignment for Efficient Video
Classification and Retrieval [16.497758750494537]
We propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism.
We leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features.
We introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions.
arXiv Detail & Related papers (2023-09-15T05:31:53Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z) - STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution
Video Prediction [78.129039340528]
We propose a StemporalResidual Predictive Model (STRPM) for high-resolution video prediction.
STRPM can generate more satisfactory results compared with various existing methods.
Experimental results show that STRPM can generate more satisfactory results compared with various existing methods.
arXiv Detail & Related papers (2022-03-30T06:24:00Z) - An Efficient Network Design for Face Video Super-resolution [3.950210498382569]
We construct a dataset consisting entirely of face video sequences for network training and evaluation.
We use three combined strategies to optimize the network parameters with a simultaneous train-evaluation method to accelerate optimization process.
The generated network can reduce at least 52.4% parameters and 20.7% FLOPs, achieve better performance on PSNR, SSIM.
arXiv Detail & Related papers (2021-09-28T11:28:34Z) - Self-Conditioned Probabilistic Learning of Video Rescaling [70.10092286301997]
We propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously.
We decrease the entropy of the information lost in the downscaling by maximizing its conditioned probability on the strong spatial-temporal prior information.
We extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed.
arXiv Detail & Related papers (2021-07-24T15:57:15Z) - An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time
Video Enhancement [132.60976158877608]
We propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples.
In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information.
The proposed design allows our recurrent cells to efficiently propagate-temporal-information across frames and reduces the need for high complexity networks.
arXiv Detail & Related papers (2020-12-24T00:03:29Z) - Enhanced Quadratic Video Interpolation [56.54662568085176]
We propose an enhanced quadratic video (EQVI) model to handle more complicated scenes and motion patterns.
To further boost the performance, we devise a novel multi-scale fusion network (MS-Fusion) which can be regarded as a learnable augmentation process.
The proposed EQVI model won the first place in the AIM 2020 Video Temporal Super-Resolution Challenge.
arXiv Detail & Related papers (2020-09-10T02:31:50Z) - Deep Space-Time Video Upsampling Networks [47.62807427163614]
Video super-resolution (VSR) and frame (FI) are traditional computer vision problems.
We propose an end-to-end framework for the space-time video upsampling by efficiently merging VSR and FI into a joint framework.
Results show better results both quantitatively and qualitatively, while reducing the time (x7 faster) and the number of parameters (30%) compared to baselines.
arXiv Detail & Related papers (2020-04-06T07:04:21Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.