SuperTran: Reference Based Video Transformer for Enhancing Low Bitrate
Streams in Real Time
- URL: http://arxiv.org/abs/2211.12604v1
- Date: Tue, 22 Nov 2022 22:03:11 GMT
- Title: SuperTran: Reference Based Video Transformer for Enhancing Low Bitrate
Streams in Real Time
- Authors: Tejas Khot, Nataliya Shapovalova, Silviu Andrei, Walterio Mayol-Cuevas
- Abstract summary: This work focuses on low video streaming scenarios (e.g. 50 - 200Kbps) where the video quality is severely compromised.
We present a family of novel deep generative models for enhancing perceptual video quality of such streams by performing super-resolution while also removing compression artifacts.
Our model, which we call SuperTran, consumes as input a single high-quality, high-resolution reference images in addition to the low-quality, low-resolution video stream.
- Score: 0.6308539010172309
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work focuses on low bitrate video streaming scenarios (e.g. 50 -
200Kbps) where the video quality is severely compromised. We present a family
of novel deep generative models for enhancing perceptual video quality of such
streams by performing super-resolution while also removing compression
artifacts. Our model, which we call SuperTran, consumes as input a single
high-quality, high-resolution reference images in addition to the low-quality,
low-resolution video stream. The model thus learns how to borrow or copy visual
elements like textures from the reference image and fill in the remaining
details from the low resolution stream in order to produce perceptually
enhanced output video. The reference frame can be sent once at the start of the
video session or be retrieved from a gallery. Importantly, the resulting output
has substantially better detail than what has been otherwise possible with
methods that only use a low resolution input such as the SuperVEGAN method.
SuperTran works in real-time (up to 30 frames/sec) on the cloud alongside
standard pipelines.
Related papers
- Implicit Neural Representation for Videos Based on Residual Connection [0.0]
We propose a method that uses low-resolution frames as residual connection that is considered effective for image reconstruction.
Experimental results show that our method outperforms the existing method, HNeRV, in PSNR for 46 of the 49 videos.
arXiv Detail & Related papers (2024-06-15T10:10:48Z) - Super Efficient Neural Network for Compression Artifacts Reduction and
Super Resolution [2.0762623979470205]
We propose a lightweight convolutional neural network (CNN)-based algorithm which simultaneously performs artifacts reduction and super resolution.
The output shows a 4-6 increase in video multi-method assessment fusion (VMAF) score compared to traditional upscaling approaches.
arXiv Detail & Related papers (2024-01-26T04:11:14Z) - VideoGen: A Reference-Guided Latent Diffusion Approach for High
Definition Text-to-Video Generation [73.54366331493007]
VideoGen is a text-to-video generation approach, which can generate a high-definition video with high frame fidelity and strong temporal consistency.
We leverage an off-the-shelf text-to-image generation model, e.g., Stable Diffusion, to generate an image with high content quality from the text prompt.
arXiv Detail & Related papers (2023-09-01T11:14:43Z) - Towards High-Quality and Efficient Video Super-Resolution via
Spatial-Temporal Data Overfitting [27.302681897961588]
Deep convolutional neural networks (DNNs) are widely used in various fields of computer vision.
We propose a novel method for high-quality and efficient video resolution upscaling tasks.
We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality.
arXiv Detail & Related papers (2023-03-15T02:40:02Z) - MagicVideo: Efficient Video Generation With Latent Diffusion Models [76.95903791630624]
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo.
Due to a novel and efficient 3D U-Net design and modeling video distributions in a low-dimensional space, MagicVideo can synthesize video clips with 256x256 spatial resolution on a single GPU card.
We conduct extensive experiments and demonstrate that MagicVideo can generate high-quality video clips with either realistic or imaginary content.
arXiv Detail & Related papers (2022-11-20T16:40:31Z) - Compressed Vision for Efficient Video Understanding [83.97689018324732]
We propose a framework enabling research on hour-long videos with the same hardware that can now process second-long videos.
We replace standard video compression, e.g. JPEG, with neural compression and show that we can directly feed compressed videos as inputs to regular video networks.
arXiv Detail & Related papers (2022-10-06T15:35:49Z) - Gemino: Practical and Robust Neural Compression for Video Conferencing [19.137804113000474]
Gemino is a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline.
We show that Gemino operates on videos in real-time on a Titan X GPU, and achieves 2.2-5x lower than traditional video codecs for the same perceptual quality.
arXiv Detail & Related papers (2022-09-21T17:10:46Z) - Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed
Video Quality Enhancement [74.1052624663082]
We develop a deep learning architecture capable of restoring detail to compressed videos.
We show that this improves restoration accuracy compared to prior compression correction methods.
We condition our model on quantization data which is readily available in the bitstream.
arXiv Detail & Related papers (2022-01-31T18:56:04Z) - Memory-Augmented Non-Local Attention for Video Super-Resolution [61.55700315062226]
We propose a novel video super-resolution method that aims at generating high-fidelity high-resolution (HR) videos from low-resolution (LR) ones.
Previous methods predominantly leverage temporal neighbor frames to assist the super-resolution of the current frame.
In contrast, we devise a cross-frame non-local attention mechanism that allows video super-resolution without frame alignment.
arXiv Detail & Related papers (2021-08-25T05:12:14Z) - COMISR: Compression-Informed Video Super-Resolution [76.94152284740858]
Most videos on the web or mobile devices are compressed, and the compression can be severe when the bandwidth is limited.
We propose a new compression-informed video super-resolution model to restore high-resolution content without introducing artifacts caused by compression.
arXiv Detail & Related papers (2021-05-04T01:24:44Z) - Efficient Video Compression via Content-Adaptive Super-Resolution [11.6624528293976]
Video compression is a critical component of Internet video delivery.
Recent work has shown that deep learning techniques can rival or outperform human algorithms.
This paper presents a new approach that augments a recent deep learning-based video compression scheme.
arXiv Detail & Related papers (2021-04-06T07:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.