CAESR: Conditional Autoencoder and Super-Resolution for Learned Spatial
Scalability
- URL: http://arxiv.org/abs/2202.00416v1
- Date: Tue, 1 Feb 2022 13:59:43 GMT
- Title: CAESR: Conditional Autoencoder and Super-Resolution for Learned Spatial
Scalability
- Authors: Charles Bonnineau, Wassim Hamidouche, Jean-Fran\c{c}ois Travers, Naty
Sidaty, Jean-Yves Aubi\'e, Olivier Deforges
- Abstract summary: We present CAESR, a learning-based coding approach for spatial scalability based on the versatile video coding (VVC) standard.
Our framework considers a low-resolution signal encoded with VVC intra-mode as a base-layer (BL), and a deep conditional autoencoder with hyperprior (AE-HP) as an enhancement-layer (EL) model.
Our solution is competitive with the VVC full-resolution intra coding while being scalable.
- Score: 13.00115213941287
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we present CAESR, an hybrid learning-based coding approach for
spatial scalability based on the versatile video coding (VVC) standard. Our
framework considers a low-resolution signal encoded with VVC intra-mode as a
base-layer (BL), and a deep conditional autoencoder with hyperprior (AE-HP) as
an enhancement-layer (EL) model. The EL encoder takes as inputs both the
upscaled BL reconstruction and the original image. Our approach relies on
conditional coding that learns the optimal mixture of the source and the
upscaled BL image, enabling better performance than residual coding. On the
decoder side, a super-resolution (SR) module is used to recover high-resolution
details and invert the conditional coding process. Experimental results have
shown that our solution is competitive with the VVC full-resolution intra
coding while being scalable.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Standard compliant video coding using low complexity, switchable neural wrappers [8.149130379436759]
We propose a new framework featuring standard compatibility, high performance, and low decoding complexity.
We employ a set of jointly optimized neural pre- and post-processors, wrapping a standard video, to encode videos at different resolutions.
We design a low complexity neural post-processor architecture that can handle different upsampling ratios.
arXiv Detail & Related papers (2024-07-10T06:36:45Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Hierarchical B-frame Video Coding Using Two-Layer CANF without Motion
Coding [17.998825368770635]
We propose a novel B-frame coding architecture based on two-layer Augmented Normalization Flows (CANF)
Our proposed idea of video compression without motion coding offers a new direction for learned video coding.
The rate-distortion performance of our scheme is slightly lower than that of the state-of-the-art learned B-frame coding scheme, B-CANF, but outperforms other learned B-frame coding schemes.
arXiv Detail & Related papers (2023-04-05T18:36:28Z) - CANF-VC: Conditional Augmented Normalizing Flows for Video Compression [81.41594331948843]
CANF-VC is an end-to-end learning-based video compression system.
It is based on conditional augmented normalizing flows (ANF)
arXiv Detail & Related papers (2022-07-12T04:53:24Z) - Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z) - Deep Learning-Based Intra Mode Derivation for Versatile Video Coding [65.96100964146062]
An intelligent intra mode derivation method is proposed in this paper, termed as Deep Learning based Intra Mode Derivation (DLIMD)
The architecture of DLIMD is developed to adapt to different quantization parameter settings and variable coding blocks including non-square ones.
The proposed method can achieve 2.28%, 1.74%, and 2.18% bit rate reduction on average for Y, U, and V components on the platform of Versatile Video Coding (VVC) test model.
arXiv Detail & Related papers (2022-04-08T13:23:59Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - Transformer-based Image Compression [18.976159633970177]
Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders.
TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard.
arXiv Detail & Related papers (2021-11-12T13:13:20Z) - Super-Resolving Compressed Video in Coding Chain [27.994055823226848]
We present a mixed-resolution coding framework, which cooperates with a reference-based DCNN.
In this novel coding chain, the reference-based DCNN learns the direct mapping from low-resolution (LR) compressed video to their high-resolution (HR) clean version at the decoder side.
arXiv Detail & Related papers (2021-03-26T03:39:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.