Lossy Compression with Distortion Constrained Optimization
- URL: http://arxiv.org/abs/2005.04064v1
- Date: Fri, 8 May 2020 14:27:01 GMT
- Title: Lossy Compression with Distortion Constrained Optimization
- Authors: Ties van Rozendaal, Guillaume Sauti\`ere, Taco S. Cohen
- Abstract summary: We show that the constrained optimization method of Rezende and Viola, 2018 is more appropriate for training lossy compression models than a $beta$-VAE.
We show that the method does manage to satisfy the constraint on a realistic image compression task, outperforms a constrained optimization method based on a hinge-loss, and is more practical to use for model selection than a $beta$-VAE.
- Score: 14.45964083146559
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When training end-to-end learned models for lossy compression, one has to
balance the rate and distortion losses. This is typically done by manually
setting a tradeoff parameter $\beta$, an approach called $\beta$-VAE. Using
this approach it is difficult to target a specific rate or distortion value,
because the result can be very sensitive to $\beta$, and the appropriate value
for $\beta$ depends on the model and problem setup. As a result, model
comparison requires extensive per-model $\beta$-tuning, and producing a whole
rate-distortion curve (by varying $\beta$) for each model to be compared. We
argue that the constrained optimization method of Rezende and Viola, 2018 is a
lot more appropriate for training lossy compression models because it allows us
to obtain the best possible rate subject to a distortion constraint. This
enables pointwise model comparisons, by training two models with the same
distortion target and comparing their rate. We show that the method does manage
to satisfy the constraint on a realistic image compression task, outperforms a
constrained optimization method based on a hinge-loss, and is more practical to
use for model selection than a $\beta$-VAE.
Related papers
- Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences? [20.004349891563706]
After pre-training, large language models are aligned with human preferences based on pairwise comparisons.<n>We introduce an alignment method's distortion: the worst-case ratio between the optimal achievable average utility, and the average utility of the learned policy.
arXiv Detail & Related papers (2025-05-29T17:59:20Z) - Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit [1.8337746049048673]
We show an intricate dependence of optimal $eta$ scaling on the pretraining token budget $T$, $B$ and its relation to the critical batch size $B_mathrmcrit$.
Surprisingly, our results demonstrate that the observed optimal $eta$ and $B$ dynamics are preserved with $mu$P model scaling.
arXiv Detail & Related papers (2024-10-08T09:06:34Z) - u-$μ$P: The Unit-Scaled Maximal Update Parametrization [4.275373946090221]
We present a new scheme, u-$mu$P, which improves upon $mu$P by combining it with Unit Scaling.
The two techniques have a natural affinity: $mu$P ensures that the scale of activations is independent of model size, and Unit Scaling ensures that activations, weights and gradients begin training with a scale of one.
arXiv Detail & Related papers (2024-07-24T17:58:42Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - On Optimal Caching and Model Multiplexing for Large Model Inference [66.50550915522551]
Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges.
We study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model multiplexer to choose from an ensemble of models for query processing.
arXiv Detail & Related papers (2023-06-03T05:01:51Z) - Rotation Invariant Quantization for Model Compression [7.633595230914364]
Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources.
We suggest a Rotation-Invariant Quantization (RIQ) technique that utilizes a single parameter to quantize the entire NN model.
arXiv Detail & Related papers (2023-03-03T10:53:30Z) - CrAM: A Compression-Aware Minimizer [103.29159003723815]
We propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way.
CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning.
CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware.
arXiv Detail & Related papers (2022-07-28T16:13:28Z) - R-Drop: Regularized Dropout for Neural Networks [99.42791938544012]
Dropout is a powerful and widely used technique to regularize the training of deep neural networks.
We introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models to be consistent with each other.
arXiv Detail & Related papers (2021-06-28T08:01:26Z) - Learning Scalable $\ell_\infty$-constrained Near-lossless Image
Compression via Joint Lossy Image and Residual Compression [118.89112502350177]
We propose a novel framework for learning $ell_infty$-constrained near-lossless image compression.
We derive the probability model of the quantized residual by quantizing the learned probability model of the original residual.
arXiv Detail & Related papers (2021-03-31T11:53:36Z) - Overfitting for Fun and Profit: Instance-Adaptive Data Compression [20.764189960709164]
Neural data compression has been shown to outperform classical methods in terms of $RD$ performance.
In this paper we take this concept to the extreme, adapting the full model to a single video, and sending model updates along with the latent representation.
We demonstrate that full-model adaptation improves $RD$ performance by 1 dB, with respect to encoder-only finetuning.
arXiv Detail & Related papers (2021-01-21T15:58:58Z) - Training with Quantization Noise for Extreme Model Compression [57.51832088938618]
We tackle the problem of producing compact models, maximizing their accuracy for a given model size.
A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods.
arXiv Detail & Related papers (2020-04-15T20:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.