Generalized Octave Convolutions for Learned Multi-Frequency Image
Compression
- URL: http://arxiv.org/abs/2002.10032v3
- Date: Thu, 31 Dec 2020 06:34:00 GMT
- Title: Generalized Octave Convolutions for Learned Multi-Frequency Image
Compression
- Authors: Mohammad Akbari and Jie Liang and Jingning Han and Chengjie Tu
- Abstract summary: We propose the first learned multi-frequency image compression and entropy coding approach.
It is based on the recently developed octave convolutions to factorize the latents into high and low frequency (resolution) components.
We show that the proposed generalized octave convolution can improve the performance of other auto-encoder-based computer vision tasks.
- Score: 20.504561050200365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned image compression has recently shown the potential to outperform the
standard codecs. State-of-the-art rate-distortion (R-D) performance has been
achieved by context-adaptive entropy coding approaches in which hyperprior and
autoregressive models are jointly utilized to effectively capture the spatial
dependencies in the latent representations. However, the latents are feature
maps of the same spatial resolution in previous works, which contain some
redundancies that affect the R-D performance. In this paper, we propose the
first learned multi-frequency image compression and entropy coding approach
that is based on the recently developed octave convolutions to factorize the
latents into high and low frequency (resolution) components, where the low
frequency is represented by a lower resolution. Therefore, its spatial
redundancy is reduced, which improves the R-D performance. Novel generalized
octave convolution and octave transposed-convolution architectures with
internal activation layers are also proposed to preserve more spatial structure
of the information. Experimental results show that the proposed scheme not only
outperforms all existing learned methods as well as standard codecs such as the
next-generation video coding standard VVC (4:2:0) on the Kodak dataset in both
PSNR and MS-SSIM. We also show that the proposed generalized octave convolution
can improve the performance of other auto-encoder-based computer vision tasks
such as semantic segmentation and image denoising.
Related papers
- Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - Unifying Generation and Compression: Ultra-low bitrate Image Coding Via
Multi-stage Transformer [35.500720262253054]
This paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression.
A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization.
Experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception.
arXiv Detail & Related papers (2024-03-06T14:27:02Z) - ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image
Compression [18.05997169440533]
We propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive auto-regressive.
We show that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM.
arXiv Detail & Related papers (2023-07-12T11:45:54Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Learned Image Compression with Generalized Octave Convolution and
Cross-Resolution Parameter Estimation [5.238765582868391]
We propose a learned multi-resolution image compression framework, which exploits octave convolutions to factorize the latent representations into the high-resolution (HR) and low-resolution (LR) parts.
Experimental results show that our method separately reduces the decoding time by approximately 73.35 % and 93.44 % compared with that of state-of-the-art learned image compression methods.
arXiv Detail & Related papers (2022-09-07T08:21:52Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - LC-FDNet: Learned Lossless Image Compression with Frequency
Decomposition Network [14.848279912686948]
Recent learning-based image compression methods do not consider the performance drop in the high-frequency region.
We propose a new method that proceeds the encoding in a coarse-to-fine manner to separate and process low and high-frequency regions differently.
Experiments show that the proposed method achieves state-of-the-art performance for benchmark high-resolution datasets.
arXiv Detail & Related papers (2021-12-13T04:49:34Z) - UltraSR: Spatial Encoding is a Missing Key for Implicit Image
Function-based Arbitrary-Scale Super-Resolution [74.82282301089994]
In this work, we propose UltraSR, a simple yet effective new network design based on implicit image functions.
We show that spatial encoding is indeed a missing key towards the next-stage high-accuracy implicit image function.
Our UltraSR sets new state-of-the-art performance on the DIV2K benchmark under all super-resolution scales.
arXiv Detail & Related papers (2021-03-23T17:36:42Z) - Learned Multi-Resolution Variable-Rate Image Compression with
Octave-based Residual Blocks [15.308823742699039]
We propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convolutions (GoTConv)
To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced.
Experimental results show that the proposed framework trained with variable-rate objective function outperforms the standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.
arXiv Detail & Related papers (2020-12-31T06:26:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.