Related papers: OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

URL: http://arxiv.org/abs/2407.08545v1
Date: Thu, 11 Jul 2024 14:30:46 GMT
Title: OMR-NET: a two-stage octave multi-scale residual network for screen content image compression
Authors: Shiqi Jiang, Ting Ren, Congrui Fu, Shuai Li, Hui Yuan,
Abstract summary: Screen content (SC) differs from natural scene (NS) with unique characteristics such as noise-free, repetitive patterns, and high contrast. We propose an improved two-stage octave convolutional residual blocks (IToRB) for high and low-frequency feature extraction. We also employ a window-based attention module (WAM) to capture pixel correlations, especially for high contrast regions in the image.
Score: 11.518417977364377
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Screen content (SC) differs from natural scene (NS) with unique characteristics such as noise-free, repetitive patterns, and high contrast. Aiming at addressing the inadequacies of current learned image compression (LIC) methods for SC, we propose an improved two-stage octave convolutional residual blocks (IToRB) for high and low-frequency feature extraction and a cascaded two-stage multi-scale residual blocks (CTMSRB) for improved multi-scale learning and nonlinearity in SC. Additionally, we employ a window-based attention module (WAM) to capture pixel correlations, especially for high contrast regions in the image. We also construct a diverse SC image compression dataset (SDU-SCICD2K) for training, including text, charts, graphics, animation, movie, game and mixture of SC images and NS images. Experimental results show our method, more suited for SC than NS data, outperforms existing LIC methods in rate-distortion performance on SC images. The code is publicly available at https://github.com/SunshineSki/OMR Net.git.

Related papers

Frequency-enhanced Multi-granularity Context Network for Efficient Vertebrae Segmentation [33.99418884128739]
We introduce a Frequency-enhanced Multi-granularity Context Network (FMC-Net) to improve vertebrae segmentation accuracy.<n>For the high-frequency components, we apply a High-frequency Feature Refinement (HFR) to amplify the prominence of key features.<n>For the low-frequency components, we use a Multi-granularity State Space Model (MG-SSM) to aggregate feature representations with different receptive fields.
arXiv Detail & Related papers (2025-06-29T04:53:02Z)
Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning [69.33115351856785]
We present a novel method, called T2I-PAL, to tackle the modality gap issue when using only text captions for PEFT.<n>The core design of T2I-PAL is to leverage pre-trained text-to-image generation models to generate photo-realistic and diverse images from text captions.<n>Extensive experiments on multiple benchmarks, including MS-COCO, VOC2007, and NUS-WIDE, show that our T2I-PAL can boost recognition performance by 3.47% in average.
arXiv Detail & Related papers (2025-06-12T11:09:49Z)
Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression [67.34466255300339]
This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets. We introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity. We construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms.
arXiv Detail & Related papers (2025-02-21T03:15:16Z)
You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement [50.37253008333166]
Low-Light Image Enhancement (LLIE) task tends to restore the details and visual information from corrupted low-light images. We propose a novel trainable color space, named Horizontal/Vertical-Intensity (HVI) It not only decouples brightness and color from RGB channels to mitigate the instability during enhancement but also adapts to low-light images in different illumination ranges due to the trainable parameters.
arXiv Detail & Related papers (2024-02-08T16:47:43Z)
Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning [19.346284003982035]
We propose a unified framework (IVTMNet) for unsupervised image and video tone mapping. For video tone mapping, we propose a temporal-feature-replaced (TFR) module to efficiently utilize the temporal correlation. Experimental results demonstrate that our method outperforms state-of-the-art image and video tone mapping methods.
arXiv Detail & Related papers (2023-03-13T17:45:39Z)
Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes. We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z)
Coil2Coil: Self-supervised MR image denoising using phased-array coil images [23.595716054832916]
We propose a new self-supervised denoising method, Coil2Coil (C2C), that does not require the acquisition of clean images or paired noise-corrupted images for training. C2C shows the best performance against several self-supervised methods, reporting comparable outcomes to supervised methods. Because of the significant advantage of not requiring additional scans for clean or paired images, the method can be easily utilized for various clinical applications.
arXiv Detail & Related papers (2022-08-16T05:57:24Z)
Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising. We propose rank-enhanced low-dimensional convolution set (Re-ConvSet) We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z)
Multi-scale frequency separation network for image deblurring [10.511076996096117]
We present a new method called multi-scale frequency separation network (MSFS-Net) for image deblurring. MSFS-Net captures the low and high-frequency information of image at multiple scales. Experiments on benchmark datasets show that the proposed network achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-06-01T23:48:35Z)
Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction [138.04956118993934]
We propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST) CST embedding HSI sparsity into deep learning for HSI reconstruction. In particular, CST uses our proposed spectra-aware screening mechanism (SASM) for coarse patch selecting. Then the selected patches are fed into our customized spectra-aggregation hashing multi-head self-attention (SAH-MSA) for fine pixel clustering and self-similarity capturing.
arXiv Detail & Related papers (2022-03-09T16:17:47Z)
Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization [59.19214040221055]
We propose a novel spatial-separated curve rendering network (S$2$CRNet) for efficient and high-resolution image harmonization. The proposed method reduces more than 90% parameters compared with previous methods. Our method can work smoothly on higher resolution images in real-time which is more than 10$times$ faster than the existing methods.
arXiv Detail & Related papers (2021-09-13T07:20:16Z)
Learned Image Compression with Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules [22.818632387206257]
Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures. We propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations. In the encoding/decoding network design part, we propose a residual blocks (CRB) where multiple residual blocks are serially connected with additional shortcut connections.
arXiv Detail & Related papers (2021-07-14T02:54:22Z)
Asymmetric CNN for image super-resolution [102.96131810686231]
Deep convolutional neural networks (CNNs) have been widely applied for low-level vision over the past five years. We propose an asymmetric CNN (ACNet) comprising an asymmetric block (AB), a mem?ory enhancement block (MEB) and a high-frequency feature enhancement block (HFFEB) for image super-resolution. Our ACNet can effectively address single image super-resolution (SISR), blind SISR and blind SISR of blind noise problems.
arXiv Detail & Related papers (2021-03-25T07:10:46Z)
Learned Multi-Resolution Variable-Rate Image Compression with Octave-based Residual Blocks [15.308823742699039]
We propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convolutions (GoTConv) To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced. Experimental results show that the proposed framework trained with variable-rate objective function outperforms the standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.
arXiv Detail & Related papers (2020-12-31T06:26:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.