Related papers: Frequency-Aware Transformer for Learned Image Compression

Frequency-Aware Transformer for Learned Image Compression

URL: http://arxiv.org/abs/2310.16387v3
Date: Thu, 21 Mar 2024 04:52:57 GMT
Title: Frequency-Aware Transformer for Learned Image Compression
Authors: Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong,
Abstract summary: We propose a frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for Learned Image Compression (LIC) The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images. We also introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance.
Score: 64.28698450919647
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Learned image compression (LIC) has gained traction as an effective solution for image storage and transmission in recent years. However, existing LIC methods are redundant in latent representation due to limitations in capturing anisotropic frequency components and preserving directional details. To overcome these challenges, we propose a novel frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for LIC. The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images. Additionally, we introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance. Furthermore, we present a transformer-based channel-wise autoregressive (T-CA) model that effectively exploits channel dependencies. Experiments show that our method achieves state-of-the-art rate-distortion performance compared to existing LIC methods, and evidently outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in BD-rate on the Kodak, Tecnick, and CLIC datasets.

Related papers

Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression [67.34466255300339]
This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets. We introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity. We construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms.
arXiv Detail & Related papers (2025-02-21T03:15:16Z)
Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR) In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks. We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z)
Few-Shot Domain Adaptation for Learned Image Compression [24.37696296367332]
Learned image compression (LIC) has achieved state-of-the-art rate-distortion performance. LIC models usually suffer from significant performance degradation when applied to out-of-training-domain images. We propose a few-shot domain adaptation method for LIC by integrating plug-and-play adapters into pre-trained models.
arXiv Detail & Related papers (2024-09-17T12:05:29Z)
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning [49.275450836604726]
We present a novel frequency-based Self-Supervised Learning (SSL) approach that significantly enhances its efficacy for pre-training. We employ a two-branch framework empowered by knowledge distillation, enabling the model to take both the filtered and original images as input.
arXiv Detail & Related papers (2024-09-16T15:10:07Z)
Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression [0.0]
We propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map. Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies. We also introduce a Mixed Local-Global Feed Forward Network (MLGFFN) within the Transformer block to enhance the extraction of diverse and rich information.
arXiv Detail & Related papers (2024-08-07T15:35:25Z)
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation [99.57024606542416]
We propose an adaptive all-in-one image restoration network based on frequency mining and modulation. Our approach is motivated by the observation that different degradation types impact the image content on different frequency subbands. The proposed model achieves adaptive reconstruction by accentuating the informative frequency subbands according to different input degradations.
arXiv Detail & Related papers (2024-03-21T17:58:14Z)
Misalignment-Robust Frequency Distribution Loss for Image Transformation [51.0462138717502]
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution. We introduce a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain. Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.
arXiv Detail & Related papers (2024-02-28T09:27:41Z)
End-to-End Optimized Image Compression with the Frequency-Oriented Transform [8.27145506280741]
We propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The model enables scalable coding through the selective transmission of arbitrary frequency components. Our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric.
arXiv Detail & Related papers (2024-01-16T08:16:10Z)
Progressive Learning with Visual Prompt Tuning for Variable-Rate Image Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression. Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively. Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z)
LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression [27.02281402358164]
We propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression. We introduce a few large kernelbased depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.
arXiv Detail & Related papers (2023-04-19T11:19:10Z)
High-Fidelity Variable-Rate Image Compression via Invertible Activation Transformation [24.379052026260034]
We propose the Invertible Activation Transformation (IAT) module to tackle the issue of high-fidelity fine variable-rate image compression. IAT and QLevel together give the image compression model the ability of fine variable-rate control while better maintaining the image fidelity. Our method outperforms the state-of-the-art variable-rate image compression method by a large margin, especially after multiple re-encodings.
arXiv Detail & Related papers (2022-09-12T07:14:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.