Related papers: FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression

FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression

URL: http://arxiv.org/abs/2502.15174v1
Date: Fri, 21 Feb 2025 03:15:16 GMT
Title: FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression
Authors: Shiqi Jiang, Hui Yuan, Shuai Li, Huanqiang Zeng, Sam Kwong,
Abstract summary: This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets.<n>We introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity.<n>We construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms.
Score: 67.34466255300339
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The learned image compression (LIC) methods have already surpassed traditional techniques in compressing natural scene (NS) images. However, directly applying these methods to screen content (SC) images, which possess distinct characteristics such as sharp edges, repetitive patterns, embedded text and graphics, yields suboptimal results. This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets. To overcome these challenges, we propose a novel compression method that employs a multi-frequency two-stage octave residual block (MToRB) for feature extraction, a cascaded triple-scale feature fusion residual block (CTSFRB) for multi-scale feature integration and a multi-frequency context interaction module (MFCIM) to reduce inter-frequency correlations. Additionally, we introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity. Furthermore, we construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms. Experimental results demonstrate that our approach significantly improves SC image compression performance, outperforming traditional standards and state-of-the-art learning-based methods in terms of peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM).

Related papers

COLI: A Hierarchical Efficient Compressor for Large Images [18.697445453003983]
Implicit Neural Representations (INRs) present a promising alternative by learning continuous mappings from spatial coordinates to pixel intensities for individual images.<n>We introduce COLI (Compressor for Large Images), a novel framework leveraging Neural Representations for Videos (NeRV)<n>We show that COLI consistently achieves competitive or superior PSNR and SSIM metrics at significantly reduced bits per pixel (bpp) while accelerating NeRV training by up to 4 times.
arXiv Detail & Related papers (2025-07-15T16:07:07Z)
Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
Interleaved Block-based Learned Image Compression with Feature Enhancement and Quantization Error Compensation [18.15640294602421]
We propose a feature extraction module, a feature refinement module, and a feature enhancement module.<n>Our four modules can be readily integrated into state-of-the-art LIC methods.<n>Experiments show that combining our modules with Tiny-LIC outperforms existing LIC methods and image compression standards in terms of peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM) on the Kodak dataset and the CLIC dataset.
arXiv Detail & Related papers (2025-02-21T03:40:27Z)
OMR-NET: a two-stage octave multi-scale residual network for screen content image compression [11.518417977364377]
Screen content (SC) differs from natural scene (NS) with unique characteristics such as noise-free, repetitive patterns, and high contrast. We propose an improved two-stage octave convolutional residual blocks (IToRB) for high and low-frequency feature extraction. We also employ a window-based attention module (WAM) to capture pixel correlations, especially for high contrast regions in the image.
arXiv Detail & Related papers (2024-07-11T14:30:46Z)
End-to-End Optimized Image Compression with the Frequency-Oriented Transform [8.27145506280741]
We propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The model enables scalable coding through the selective transmission of arbitrary frequency components. Our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric.
arXiv Detail & Related papers (2024-01-16T08:16:10Z)
Transferable Learned Image Compression-Resistant Adversarial Perturbations [66.46470251521947]
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. We introduce a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules.
arXiv Detail & Related papers (2024-01-06T03:03:28Z)
Frequency-Aware Transformer for Learned Image Compression [64.28698450919647]
We propose a frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for Learned Image Compression (LIC)<n>The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images.<n>We also introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance.
arXiv Detail & Related papers (2023-10-25T05:59:25Z)
Exploring Effective Mask Sampling Modeling for Neural Image Compression [171.35596121939238]
Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression. Our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods.
arXiv Detail & Related papers (2023-06-09T06:50:20Z)
Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising. We propose rank-enhanced low-dimensional convolution set (Re-ConvSet) We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z)
Multi-scale frequency separation network for image deblurring [10.511076996096117]
We present a new method called multi-scale frequency separation network (MSFS-Net) for image deblurring. MSFS-Net captures the low and high-frequency information of image at multiple scales. Experiments on benchmark datasets show that the proposed network achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-06-01T23:48:35Z)
Learned Image Compression with Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules [22.818632387206257]
Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures. We propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations. In the encoding/decoding network design part, we propose a residual blocks (CRB) where multiple residual blocks are serially connected with additional shortcut connections.
arXiv Detail & Related papers (2021-07-14T02:54:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.