Multi-Scale Feature Prediction with Auxiliary-Info for Neural Image Compression
- URL: http://arxiv.org/abs/2409.12719v1
- Date: Thu, 19 Sep 2024 12:41:53 GMT
- Title: Multi-Scale Feature Prediction with Auxiliary-Info for Neural Image Compression
- Authors: Chajin Shin, Sangjin Lee, Sangyoun Lee,
- Abstract summary: We introduce a new predictive structure consisting of the auxiliary coarse network and the main network, inspired by neural video compression.
Our model outperforms other neural image compression models and achieves a 19.49% higher rate-distortion performance than VVC on Tecnick dataset.
- Score: 13.076563599765176
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, significant improvements in rate-distortion performance of image compression have been achieved with deep-learning techniques. A key factor in this success is the use of additional bits to predict an approximation of the latent vector, which is the output of the encoder, through another neural network. Then, only the difference between the prediction and the latent vector is coded into the bitstream, along with its estimated probability distribution. We introduce a new predictive structure consisting of the auxiliary coarse network and the main network, inspired by neural video compression. The auxiliary coarse network encodes the auxiliary information and predicts the approximation of the original image as multi-scale features. The main network encodes the residual between the predicted feature from the auxiliary coarse network and the feature of the original image. To further leverage our new structure, we propose Auxiliary info-guided Feature Prediction (AFP) module that uses global correlation to predict more accurate predicted features. Moreover, we present Context Junction module that refines the auxiliary feature from AFP module and produces the residuals between the refined features and the original image features. Finally, we introduce Auxiliary info-guided Parameter Estimation (APE) module, which predicts the approximation of the latent vector and estimates the probability distribution of these residuals. We demonstrate the effectiveness of the proposed modules by various ablation studies. Under extensive experiments, our model outperforms other neural image compression models and achieves a 19.49\% higher rate-distortion performance than VVC on Tecnick dataset.
Related papers
- SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection [3.2586315449885106]
We propose a novel encoder-decoder-style neural network called SODAWideNet++ designed explicitly for Salient Object Detection.
Inspired by the vision transformers ability to attain a global receptive field from the initial stages, we introduce the Attention Guided Long Range Feature Extraction (AGLRFE) module.
In contrast to the current paradigm of ImageNet pre-training, we modify 118K annotated images from the COCO semantic segmentation dataset by binarizing the annotations to pre-train the proposed model end-to-end.
arXiv Detail & Related papers (2024-08-29T15:51:06Z) - Short-term power load forecasting method based on CNN-SAEDN-Res [12.733504847643005]
This paper presents a short-term load forecasting method based on convolutional neural network (CNN), self-attention encoder-decoder network (SAEDN) and residual-refinement (Res)
The proposed method has advantages in terms of prediction accuracy and prediction stability.
arXiv Detail & Related papers (2023-09-02T11:36:50Z) - Hybrid machine-learned homogenization: Bayesian data mining and
convolutional neural networks [0.0]
This study aims to improve the machine learned prediction by developing novel feature descriptors.
The iterative development of feature descriptors resulted in 37 novel features, being able to reduce the prediction error by roughly one third.
A combination of the feature based approach and the convolutional neural network leads to a hybrid neural network.
arXiv Detail & Related papers (2023-02-24T09:59:29Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Anti-aliasing Predictive Coding Network for Future Video Frame
Prediction [1.4610038284393165]
We introduce here a predictive coding based model that aims to generate accurate and sharp future frames.
We propose and improve several artifacts to ensure that the neural networks generate clear and natural frames.
arXiv Detail & Related papers (2023-01-13T07:38:50Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models.
Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - Learning Cross-Scale Prediction for Efficient Neural Video Compression [30.051859347293856]
We present the first neural video that can compete with the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for the low-latency mode.
We propose a novel cross-scale prediction module that achieves more effective motion compensation.
arXiv Detail & Related papers (2021-12-26T03:12:17Z) - Probabilistic Graph Attention Network with Conditional Kernels for
Pixel-Wise Prediction [158.88345945211185]
We present a novel approach that advances the state of the art on pixel-level prediction in a fundamental aspect, i.e. structured multi-scale features learning and fusion.
We propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.
arXiv Detail & Related papers (2021-01-08T04:14:29Z) - End-to-End Facial Deep Learning Feature Compression with Teacher-Student
Enhancement [57.18801093608717]
We propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks.
In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost.
We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy.
arXiv Detail & Related papers (2020-02-10T10:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.