Frequency-Aware Transformer for Learned Image Compression
- URL: http://arxiv.org/abs/2310.16387v3
- Date: Thu, 21 Mar 2024 04:52:57 GMT
- Title: Frequency-Aware Transformer for Learned Image Compression
- Authors: Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong,
- Abstract summary: We propose a frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for Learned Image Compression (LIC)
The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images.
We also introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance.
- Score: 64.28698450919647
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learned image compression (LIC) has gained traction as an effective solution for image storage and transmission in recent years. However, existing LIC methods are redundant in latent representation due to limitations in capturing anisotropic frequency components and preserving directional details. To overcome these challenges, we propose a novel frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for LIC. The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images. Additionally, we introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance. Furthermore, we present a transformer-based channel-wise autoregressive (T-CA) model that effectively exploits channel dependencies. Experiments show that our method achieves state-of-the-art rate-distortion performance compared to existing LIC methods, and evidently outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in BD-rate on the Kodak, Tecnick, and CLIC datasets.
Related papers
- AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation [99.57024606542416]
We propose an adaptive all-in-one image restoration network based on frequency mining and modulation.
Our approach is motivated by the observation that different degradation types impact the image content on different frequency subbands.
The proposed model achieves adaptive reconstruction by accentuating the informative frequency subbands according to different input degradations.
arXiv Detail & Related papers (2024-03-21T17:58:14Z) - Misalignment-Robust Frequency Distribution Loss for Image Transformation [51.0462138717502]
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution.
We introduce a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain.
Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.
arXiv Detail & Related papers (2024-02-28T09:27:41Z) - End-to-End Optimized Image Compression with the Frequency-Oriented
Transform [8.27145506280741]
We propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform.
The model enables scalable coding through the selective transmission of arbitrary frequency components.
Our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric.
arXiv Detail & Related papers (2024-01-16T08:16:10Z) - Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion [28.049668999586583]
We propose a novel and robust low-light image enhancement method via CLIP-Fourier Guided Wavelet Diffusion, abbreviated as CFWD.
CFWD leverages multimodal visual-language information in the frequency domain space created by multiple wavelet transforms to guide the enhancement process.
Our approach outperforms existing state-of-the-art methods, achieving significant progress in image quality and noise suppression.
arXiv Detail & Related papers (2024-01-08T10:08:48Z) - Progressive Learning with Visual Prompt Tuning for Variable-Rate Image
Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression.
Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively.
Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z) - LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression [27.02281402358164]
We propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression.
We introduce a few large kernelbased depth-wise convolutions to reduce more redundancy while maintaining modest complexity.
Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.
arXiv Detail & Related papers (2023-04-19T11:19:10Z) - High-Fidelity Variable-Rate Image Compression via Invertible Activation
Transformation [24.379052026260034]
We propose the Invertible Activation Transformation (IAT) module to tackle the issue of high-fidelity fine variable-rate image compression.
IAT and QLevel together give the image compression model the ability of fine variable-rate control while better maintaining the image fidelity.
Our method outperforms the state-of-the-art variable-rate image compression method by a large margin, especially after multiple re-encodings.
arXiv Detail & Related papers (2022-09-12T07:14:07Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - Channel-Level Variable Quantization Network for Deep Image Compression [50.3174629451739]
We propose a channel-level variable quantization network to dynamically allocate more convolutions for significant channels and withdraws for negligible channels.
Our method achieves superior performance and can produce much better visual reconstructions.
arXiv Detail & Related papers (2020-07-15T07:20:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.