Neural Data-Dependent Transform for Learned Image Compression
- URL: http://arxiv.org/abs/2203.04963v1
- Date: Wed, 9 Mar 2022 14:56:48 GMT
- Title: Neural Data-Dependent Transform for Learned Image Compression
- Authors: Dezhao Wang, Wenhan Yang, Yueyu Hu, Jiaying Liu
- Abstract summary: We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
- Score: 72.86505042102155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned image compression has achieved great success due to its excellent
modeling capacity, but seldom further considers the Rate-Distortion
Optimization (RDO) of each input image. To explore this potential in the
learned codec, we make the first attempt to build a neural data-dependent
transform and introduce a continuous online mode decision mechanism to jointly
optimize the coding efficiency for each individual image. Specifically, apart
from the image content stream, we employ an additional model stream to generate
the transform parameters at the decoder side. The presence of a model stream
enables our model to learn more abstract neural-syntax, which helps cluster the
latent representations of images more compactly. Beyond the transform stage, we
also adopt neural-syntax based post-processing for the scenarios that require
higher quality reconstructions regardless of extra decoding overhead. Moreover,
the involvement of the model stream further makes it possible to optimize both
the representation and the decoder in an online way, i.e. RDO at the testing
time. It is equivalent to a continuous online mode decision, like coding modes
in the traditional codecs, to improve the coding efficiency based on the
individual input image. The experimental results show the effectiveness of the
proposed neural-syntax design and the continuous online mode decision
mechanism, demonstrating the superiority of our method in coding efficiency
compared to the latest conventional standard Versatile Video Coding (VVC) and
other state-of-the-art learning-based methods.
Related papers
- Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - AICT: An Adaptive Image Compression Transformer [18.05997169440533]
We propose a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT)
The proposed ICT can capture both global and local contexts from the latent representations.
We leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation.
arXiv Detail & Related papers (2023-07-12T11:32:02Z) - Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient
Neural Image Compression [11.25130799452367]
We propose an absolute image compression transformer (ICT) for neural image compression (NIC)
ICT captures both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents.
Our framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural SwinT-ChARM.
arXiv Detail & Related papers (2023-07-05T13:17:14Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Exploring Stochastic Autoregressive Image Modeling for Visual
Representation [24.582376834198403]
We propose a novel autoregressive image modeling (named SAIM) by the two simple designs.
By introducing prediction and the parallel encoder-decoder, SAIM significantly improve the performance of autoregressive image modeling.
Our method achieves the best accuracy (83.9%) on the vanilla ViT-Base model among methods using only ImageNet-1K data.
arXiv Detail & Related papers (2022-12-03T13:04:29Z) - OL-DN: Online learning based dual-domain network for HEVC intra frame
quality enhancement [24.91807723834651]
Convolution neural network (CNN) based methods offer effective solutions for enhancing the quality of compressed image and video.
In this paper, we adopt the raw data in the quality enhancement for the HEVC intra-coded image by proposing an online learning-based method.
Our proposed online learning based dual-domain network (OL-DN) has achieved superior performance, compared with the state-of-the-art methods.
arXiv Detail & Related papers (2022-08-09T11:06:59Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z) - Transformer-based Image Compression [18.976159633970177]
Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders.
TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard.
arXiv Detail & Related papers (2021-11-12T13:13:20Z) - Towards Modality Transferable Visual Information Representation with
Optimal Model Compression [67.89885998586995]
We propose a new scheme for visual signal representation that leverages the philosophy of transferable modality.
The proposed framework is implemented on the state-of-the-art video coding standard.
arXiv Detail & Related papers (2020-08-13T01:52:40Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.