CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming
- URL: http://arxiv.org/abs/2211.08428v1
- Date: Tue, 15 Nov 2022 05:14:48 GMT
- Title: CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming
- Authors: Qihua Zhou, Ruibin Li, Song Guo, Yi Liu, Jingcai Guo, Zhenda Xu
- Abstract summary: We present Codec-aware Diffusion Modeling (CaDM), a novel Neural-enhanced Video Streaming (NVS) paradigm.
First, CaDM improves the encoder's compression efficiency by simultaneously reducing resolution and color bit-depth video frames.
- Score: 15.115975994657514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed the dramatic growth of Internet video traffic,
where the video bitstreams are often compressed and delivered in low quality to
fit the streamer's uplink bandwidth. To alleviate the quality degradation, it
comes the rise of Neural-enhanced Video Streaming (NVS), which shows great
prospects to recover low-quality videos by mostly deploying neural
super-resolution (SR) on the media server. Despite its benefit, we reveal that
current mainstream works with SR enhancement have not achieved the desired
rate-distortion trade-off between bitrate saving and quality restoration, due
to: (1) overemphasizing the enhancement on the decoder side while omitting the
co-design of encoder, (2) inherent limited restoration capacity to generate
high-fidelity perceptual details, and (3) optimizing the
compression-and-restoration pipeline from the resolution perspective solely,
without considering color bit-depth. Aiming at overcoming these limitations, we
are the first to conduct the encoder-decoder (i.e., codec) synergy by
leveraging the visual-synthesis genius of diffusion models. Specifically, we
present the Codec-aware Diffusion Modeling (CaDM), a novel NVS paradigm to
significantly reduce streaming delivery bitrate while holding pretty higher
restoration capacity over existing methods. First, CaDM improves the encoder's
compression efficiency by simultaneously reducing resolution and color
bit-depth of video frames. Second, CaDM provides the decoder with perfect
quality enhancement by making the denoising diffusion restoration aware of
encoder's resolution-color conditions. Evaluation on public cloud services with
OpenMMLab benchmarks shows that CaDM significantly saves streaming bitrate by a
nearly 100 times reduction over vanilla H.264 and achieves much better recovery
quality (e.g., FID of 0.61) over state-of-the-art neural-enhancing methods.
Related papers
- Improving the Diffusability of Autoencoders [54.920783089085035]
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos.
We perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces.
We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality.
arXiv Detail & Related papers (2025-02-20T18:45:44Z) - $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - Standard compliant video coding using low complexity, switchable neural wrappers [8.149130379436759]
We propose a new framework featuring standard compatibility, high performance, and low decoding complexity.
We employ a set of jointly optimized neural pre- and post-processors, wrapping a standard video, to encode videos at different resolutions.
We design a low complexity neural post-processor architecture that can handle different upsampling ratios.
arXiv Detail & Related papers (2024-07-10T06:36:45Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - NU-Class Net: A Novel Approach for Video Quality Enhancement [1.7763979745248648]
This paper introduces NU-Class Net, an innovative deep-learning model designed to mitigate compression artifacts stemming from lossy compression codecs.
By employing the NU-Class Net, the video encoder within the video-capturing node can reduce output quality, thereby generating low-bit-rate videos.
Experimental results affirm the efficacy of the proposed model in enhancing the perceptible quality of videos, especially those streamed at low bit rates.
arXiv Detail & Related papers (2024-01-02T11:46:42Z) - AccDecoder: Accelerated Decoding for Neural-enhanced Video Analytics [26.012783785622073]
Low-quality video is collected by existing surveillance systems because of poor quality cameras or over-compressed/pruned video streaming protocols.
We present AccDecoder, a novel accelerated decoder for real-time and neural network-based video analytics.
arXiv Detail & Related papers (2023-01-20T16:30:44Z) - Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed
Video Quality Enhancement [74.1052624663082]
We develop a deep learning architecture capable of restoring detail to compressed videos.
We show that this improves restoration accuracy compared to prior compression correction methods.
We condition our model on quantization data which is readily available in the bitstream.
arXiv Detail & Related papers (2022-01-31T18:56:04Z) - Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG
Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends.
Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z) - Ultra-low bitrate video conferencing using deep image animation [7.263312285502382]
We propose a novel deep learning approach for ultra-low video compression for video conferencing applications.
We employ deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side.
arXiv Detail & Related papers (2020-12-01T09:06:34Z) - Learning for Video Compression with Hierarchical Quality and Recurrent
Enhancement [164.7489982837475]
We propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network.
In our HLVC approach, the hierarchical quality benefits the coding efficiency, since the high quality information facilitates the compression and enhancement of low quality frames at encoder and decoder sides.
arXiv Detail & Related papers (2020-03-04T09:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.