Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression
- URL: http://arxiv.org/abs/2308.08723v1
- Date: Thu, 17 Aug 2023 01:34:51 GMT
- Title: Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression
- Authors: Huairui Wang, Nianxiang Fu, Zhenzhong Chen and Shan Liu
- Abstract summary: We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform.
Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
- Score: 63.56922682378755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned image compression methods have shown superior rate-distortion
performance and remarkable potential compared to traditional compression
methods. Most existing learned approaches use stacked convolution or
window-based self-attention for transform coding, which aggregate spatial
information in a fixed range. In this paper, we focus on extending spatial
aggregation capability and propose a dynamic kernel-based transform coding. The
proposed adaptive aggregation generates kernel offsets to capture valid
information in the content-conditioned range to help transform. With the
adaptive aggregation strategy and the sharing weights mechanism, our method can
achieve promising transform capability with acceptable model complexity.
Besides, according to the recent progress of entropy model, we define a
generalized coarse-to-fine entropy model, considering the coarse global
context, the channel-wise, and the spatial context. Based on it, we introduce
dynamic kernel in hyper-prior to generate more expressive global context.
Furthermore, we propose an asymmetric spatial-channel entropy model according
to the investigation of the spatial characteristics of the grouped latents. The
asymmetric entropy model aims to reduce statistical redundancy while
maintaining coding efficiency. Experimental results demonstrate that our method
achieves superior rate-distortion performance on three benchmarks compared to
the state-of-the-art learning-based methods.
Related papers
- Uniform Transformation: Refining Latent Representation in Variational Autoencoders [7.4316292428754105]
We introduce a novel adaptable three-stage Uniform Transformation (UT) module to address irregular latent distributions.
By reconfiguring irregular distributions into a uniform distribution in the latent space, our approach significantly enhances the disentanglement and interpretability of latent representations.
Empirical evaluations demonstrated the efficacy of our proposed UT module in improving disentanglement metrics across benchmark datasets.
arXiv Detail & Related papers (2024-07-02T21:46:23Z) - Multi-Context Dual Hyper-Prior Neural Image Compression [10.349258638494137]
We propose a Transformer-based nonlinear transform to efficiently capture both local and global information from the input image.
We also introduce a novel entropy model that incorporates two different hyperpriors to model cross-channel and spatial dependencies of the latent representation.
Our experiments show that our proposed framework performs better than the state-of-the-art methods in terms of rate-distortion performance.
arXiv Detail & Related papers (2023-09-19T17:44:44Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Learning Modulated Transformation in GANs [69.95217723100413]
We equip the generator in generative adversarial networks (GANs) with a plug-and-play module, termed as modulated transformation module (MTM)
MTM predicts spatial offsets under the control of latent codes, based on which the convolution operation can be applied at variable locations.
It is noteworthy that towards human generation on the challenging TaiChi dataset, we improve the FID of StyleGAN3 from 21.36 to 13.60, demonstrating the efficacy of learning modulated geometry transformation.
arXiv Detail & Related papers (2023-08-29T17:51:22Z) - Information-Theoretic GAN Compression with Variational Energy-based
Model [36.77535324130402]
We propose an information-theoretic knowledge distillation approach for the compression of generative adversarial networks.
We show that the proposed algorithm achieves outstanding performance in model compression of generative adversarial networks consistently.
arXiv Detail & Related papers (2023-03-28T15:32:21Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Joint Self-Attention and Scale-Aggregation for Self-Calibrated Deraining
Network [13.628218953897946]
In this paper, we propose an effective algorithm, called JDNet, to solve the single image deraining problem.
By designing the Scale-Aggregation and Self-Attention modules with Self-Calibrated convolution skillfully, the proposed model has better deraining results.
arXiv Detail & Related papers (2020-08-06T17:04:34Z) - Learning Context-Based Non-local Entropy Modeling for Image Compression [140.64888994506313]
In this paper, we propose a non-local operation for context modeling by employing the global similarity within the context.
The entropy model is further adopted as the rate loss in a joint rate-distortion optimization.
Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity.
arXiv Detail & Related papers (2020-05-10T13:28:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.