Related papers: Hybrid Convolution and Frequency State Space Network for Image Compression

Hybrid Convolution and Frequency State Space Network for Image Compression

URL: http://arxiv.org/abs/2511.20151v1
Date: Tue, 25 Nov 2025 10:21:42 GMT
Title: Hybrid Convolution and Frequency State Space Network for Image Compression
Authors: Haodong Pan, Hao Wei, Yusong Wang, Nanning Zheng, Caigui Jiang,
Abstract summary: Convolutional neural networks (CNNs) capture local high frequency details.<n>Transformers and state space model (SSMs) provide strong long range modeling capabilities.<n>We propose a Hybrid Convolution and Frequency State Space Network for LIC.
Score: 37.44884590063737
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learned image compression (LIC) has recently benefited from Transformer based and state space model (SSM) based architectures. Convolutional neural networks (CNNs) effectively capture local high frequency details, whereas Transformers and SSMs provide strong long range modeling capabilities but may cause structural information loss or ignore frequency characteristics that are crucial for compression. In this work we propose HCFSSNet, a Hybrid Convolution and Frequency State Space Network for LIC. HCFSSNet uses CNNs to extract local high frequency structures and introduces a Vision Frequency State Space (VFSS) block that models long range low frequency information. The VFSS block combines an Omni directional Neighborhood State Space (VONSS) module, which scans features horizontally, vertically and diagonally, with an Adaptive Frequency Modulation Module (AFMM) that applies content adaptive weighting of discrete cosine transform frequency components for more efficient bit allocation. To further reduce redundancy in the entropy model, we integrate AFMM with a Swin Transformer to form a Frequency Swin Transformer Attention Module (FSTAM) for frequency aware side information modeling. Experiments on the Kodak, Tecnick and CLIC Professional Validation datasets show that HCFSSNet achieves competitive rate distortion performance compared with recent SSM based codecs such as MambaIC, while using significantly fewer parameters. On Kodak, Tecnick and CLIC, HCFSSNet reduces BD rate over the VTM anchor by 18.06, 24.56 and 22.44 percent, respectively, providing an efficient and interpretable hybrid architecture for future learned image compression systems.

Related papers

FLaTEC: Frequency-Disentangled Latent Triplanes for Efficient Compression of LiDAR Point Clouds [52.997038111673966]
FLaTEC is a frequency-aware compression model that enables the compression of a full scan with high compression ratios.<n>We convert voxelized embeddings into triplane representations to reduce sparsity, computational cost, and storage requirements.<n>Our method achieves state-of-the-art rate-distortion performance and outperforms the standard codecs by 78% and 94% in BD-rate on both datasets.
arXiv Detail & Related papers (2025-11-25T08:37:49Z)
Large Kernel Modulation Network for Efficient Image Super-Resolution [5.875680381119361]
Large Kernel Modulation Network (LKMN) is a pure CNN-based model.<n>LKMN has two core components: Enhanced Partial Large Kernel Block (EPLKB) and Cross-Gate Feed-Forward Network (CGFN)<n>LKMN-L achieves 0.23 dB PSNR improvement over DAT-light on the Manga109 dataset at $times$4 upscale, with nearly $times$4.8 times faster.
arXiv Detail & Related papers (2025-08-16T03:43:14Z)
FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution [70.61549422952193]
Face super-resolution (FSR) under limited computational costs remains an open problem.<n>Existing approaches typically treat all facial pixels equally, resulting in suboptimal allocation of computational resources.<n>We propose FADPNet, a Frequency-Aware Dual-Path Network that decomposes facial features into low- and high-frequency components.
arXiv Detail & Related papers (2025-06-17T02:33:42Z)
Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension [20.360392907997117]
We propose FreqKV, a novel frequency domain key-value ( KV) compression technique.<n>Freq KV enables efficient context window extension for decoder-only large language models (LLMs)<n> Experiments on a range of long context language modeling and understanding tasks demonstrate the efficiency and effectiveness of the proposed method.
arXiv Detail & Related papers (2025-05-01T14:53:12Z)
Frequency Dynamic Convolution for Dense Image Prediction [34.915070244005854]
We introduce Frequency Dynamic Convolution (FDConv), a novel approach that mitigates limitations by learning a fixed parameter budget in the Fourier domain.<n>FDConv divides this budget into frequency-based groups with disjoint Fourier indices, enabling the construction of frequency-diverse weights without increasing the parameter cost.<n>We demonstrate that when applied to ResNet-50, FDConv achieves superior performance with a modest increase of +3.6M parameters.
arXiv Detail & Related papers (2025-03-24T15:32:06Z)
CMamba: Learned Image Compression with State Space Models [31.10785880342252]
We propose a hybrid Convolution and State Space Models (SSMs) based image compression framework to achieve superior rate-distortion performance.<n>Specifically, CMamba introduces two key components: a Content-Adaptive SSM (CA-SSM) module and a Context-Aware Entropy (CAE) module.<n> Experimental results demonstrate that CMamba achieves superior rate-distortion performance.
arXiv Detail & Related papers (2025-02-07T15:07:04Z)
Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression [0.0]
We propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map. Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies. We also introduce a Mixed Local-Global Feed Forward Network (MLGFFN) within the Transformer block to enhance the extraction of diverse and rich information.
arXiv Detail & Related papers (2024-08-07T15:35:25Z)
Frequency-Aware Transformer for Learned Image Compression [64.28698450919647]
We propose a frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for Learned Image Compression (LIC)<n>The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images.<n>We also introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance.
arXiv Detail & Related papers (2023-10-25T05:59:25Z)
Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain. In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden. Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.