Multi-Sample Training for Neural Image Compression
- URL: http://arxiv.org/abs/2209.13834v1
- Date: Wed, 28 Sep 2022 04:42:02 GMT
- Title: Multi-Sample Training for Neural Image Compression
- Authors: Tongda Xu, Yan Wang, Dailan He, Chenjian Gao, Han Gao, Kunzan Liu,
Hongwei Qin
- Abstract summary: Current state-of-the-art (sota) methods adopt uniform posterior to approximate quantization noise, and single-sample pathwise estimator to approximate the gradient of evidence lower bound (ELBO)
We propose to train NIC with multiple-sample importance weighted autoencoder (IWAE) target, which is tighter than ELBO and converges to log likelihood as sample size increases.
Our MS-NIC is plug-and-play, and can be easily extended to other neural compression tasks.
- Score: 11.167668701825134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper considers the problem of lossy neural image compression (NIC).
Current state-of-the-art (sota) methods adopt uniform posterior to approximate
quantization noise, and single-sample pathwise estimator to approximate the
gradient of evidence lower bound (ELBO). In this paper, we propose to train NIC
with multiple-sample importance weighted autoencoder (IWAE) target, which is
tighter than ELBO and converges to log likelihood as sample size increases.
First, we identify that the uniform posterior of NIC has special properties,
which affect the variance and bias of pathwise and score function estimators of
the IWAE target. Moreover, we provide insights on a commonly adopted trick in
NIC from gradient variance perspective. Based on those analysis, we further
propose multiple-sample NIC (MS-NIC), an enhanced IWAE target for NIC.
Experimental results demonstrate that it improves sota NIC methods. Our MS-NIC
is plug-and-play, and can be easily extended to other neural compression tasks.
Related papers
- Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model [4.096453902709292]
We propose a variable-rate generative NIC model to compress images to different bit rates.
By incorporating the newly proposed multi-realism technique, our method allows the users to adjust the bit rate, distortion, and realism with a single model.
Our method matches or surpasses the performance of state-of-the-art single-rate generative NIC models.
arXiv Detail & Related papers (2024-05-27T04:22:25Z) - Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs [11.729071258457138]
Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory.
VAEs estimate the theoretical upper bound of the information rate-distortion function of images.
To narrow this gap, we propose a theoretical bound-guided hierarchical VAE (BG-VAE) for neural image codecs.
arXiv Detail & Related papers (2024-03-27T13:11:34Z) - Slicer Networks [8.43960865813102]
We propose the Slicer Network, a novel architecture for medical image analysis.
The Slicer Network strategically refines and upsamples feature maps via a splatting-blurring-slicing process.
Experiments across different medical imaging applications have verified the Slicer Network's improved accuracy and efficiency.
arXiv Detail & Related papers (2024-01-18T09:50:26Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - Neural Image Compression: Generalization, Robustness, and Spectral
Biases [16.55855347335981]
Recent advances in neural image compression (NIC) have produced models that are starting to outperform classic codecs.
Successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts.
This paper presents a benchmark suite to evaluate the out-of-distribution performance of image compression methods.
arXiv Detail & Related papers (2023-07-17T17:14:17Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance.
distinct models are required to be trained to reach different points in the rate-distortion (R-D) space.
We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.