3DM-WeConvene: Learned Image Compression with 3D Multi-Level Wavelet-Domain Convolution and Entropy Model
- URL: http://arxiv.org/abs/2504.04658v1
- Date: Mon, 07 Apr 2025 01:11:50 GMT
- Title: 3DM-WeConvene: Learned Image Compression with 3D Multi-Level Wavelet-Domain Convolution and Entropy Model
- Authors: Haisheng Fu, Jie Liang, Feng Liang, Zhenman Fang, Guohe Zhang, Jingning Han,
- Abstract summary: We propose a novel framework that integrates low-complexity 3D multi-level Discrete Wavelet Transform (DWT) into convolutional layers and entropy coding.<n>Our framework consistently outperforms state-of-the-art CNN-based LIC methods in R-D performance and computational complexity.
- Score: 14.592432109760098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned image compression (LIC) has recently made significant progress, surpassing traditional methods. However, most LIC approaches operate mainly in the spatial domain and lack mechanisms for reducing frequency-domain correlations. To address this, we propose a novel framework that integrates low-complexity 3D multi-level Discrete Wavelet Transform (DWT) into convolutional layers and entropy coding, reducing both spatial and channel correlations to improve frequency selectivity and rate-distortion (R-D) performance. Our proposed 3D multi-level wavelet-domain convolution (3DM-WeConv) layer first applies 3D multi-level DWT (e.g., 5/3 and 9/7 wavelets from JPEG 2000) to transform data into the wavelet domain. Then, different-sized convolutions are applied to different frequency subbands, followed by inverse 3D DWT to restore the spatial domain. The 3DM-WeConv layer can be flexibly used within existing CNN-based LIC models. We also introduce a 3D wavelet-domain channel-wise autoregressive entropy model (3DWeChARM), which performs slice-based entropy coding in the 3D DWT domain. Low-frequency (LF) slices are encoded first to provide priors for high-frequency (HF) slices. A two-step training strategy is adopted: first balancing LF and HF rates, then fine-tuning with separate weights. Extensive experiments demonstrate that our framework consistently outperforms state-of-the-art CNN-based LIC methods in R-D performance and computational complexity, with larger gains for high-resolution images. On the Kodak, Tecnick 100, and CLIC test sets, our method achieves BD-Rate reductions of -12.24%, -15.51%, and -12.97%, respectively, compared to H.266/VVC.
Related papers
- 3D Wavelet Convolutions with Extended Receptive Fields for Hyperspectral Image Classification [12.168520751389622]
Deep neural networks face numerous challenges in hyperspectral image classification.
This paper proposes WCNet, an improved 3D-DenseNet model integrated with wavelet transforms.
Experimental results demonstrate superior performance on the IN, UP, and KSC datasets.
arXiv Detail & Related papers (2025-04-15T01:39:42Z) - Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations.<n> Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations.<n> Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z) - MDNF: Multi-Diffusion-Nets for Neural Fields on Meshes [5.284425534494986]
We propose a novel framework for representing neural fields on triangle meshes that is multi-resolution across both spatial and frequency domains.
Inspired by the Neural Fourier Filter Bank (NFFB), our architecture decomposes the frequencies and frequency domains by associating finer resolution levels with higher frequency bands.
We demonstrate the effectiveness of our approach through its application to diverse neural fields, such as synthetic RGB functions, UV texture coordinates, and normals.
arXiv Detail & Related papers (2024-09-04T19:08:13Z) - HPC: Hierarchical Progressive Coding Framework for Volumetric Video [39.403294185116]
Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications.
Current NeRF compression lacks the flexibility to adjust video quality and within a single model for various network and device capacities.
We propose HPC, a novel hierarchical progressive video coding framework achieving variable using a single model.
arXiv Detail & Related papers (2024-07-12T06:34:24Z) - Frequency-Aware Transformer for Learned Image Compression [64.28698450919647]
We propose a frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for Learned Image Compression (LIC)<n>The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images.<n>We also introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance.
arXiv Detail & Related papers (2023-10-25T05:59:25Z) - Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models [89.76587063609806]
We study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.
By explicitly modeling the wavelet signals, we find our model is able to generate images with higher quality on several datasets.
arXiv Detail & Related papers (2023-07-27T06:53:16Z) - Image Reconstruction for Accelerated MR Scan with Faster Fourier
Convolutional Neural Networks [87.87578529398019]
Partial scan is a common approach to accelerate Magnetic Resonance Imaging (MRI) data acquisition in both 2D and 3D settings.
We propose a novel convolutional operator called Faster Fourier Convolution (FasterFC) to replace the two consecutive convolution operations.
A 2D accelerated MRI method, FasterFC-End-to-End-VarNet, which uses FasterFC to improve the sensitivity maps and reconstruction quality.
A 3D accelerated MRI method called FasterFC-based Single-to-group Network (FAS-Net) that utilizes a single-to-group algorithm to guide k-space domain reconstruction
arXiv Detail & Related papers (2023-06-05T13:53:57Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - aiWave: Volumetric Image Compression with 3-D Trained Affine
Wavelet-like Transform [43.984890290691695]
Most commonly used volumetric image compression methods are based on wavelet transform, such as JP3D.
In this paper, we first design a 3-D trained wavelet-like transform to enable signal-dependent and non-separable transform.
Then, an affine wavelet basis is introduced to capture the various local correlations in different regions of volumetric images.
arXiv Detail & Related papers (2022-03-11T10:02:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.