Estimating the Rate-Distortion Function by Wasserstein Gradient Descent
- URL: http://arxiv.org/abs/2310.18908v1
- Date: Sun, 29 Oct 2023 05:29:59 GMT
- Title: Estimating the Rate-Distortion Function by Wasserstein Gradient Descent
- Authors: Yibo Yang, Stephan Eckstein, Marcel Nutz, Stephan Mandt
- Abstract summary: In the theory of lossy compression, the rate-distortion function $R(D)$ describes how much a data source can be compressed (in bit-rate) at any given level of fidelity (distortion)
We propose a new method to estimate $R(D)$ from the perspective of optimal transport.
- Score: 36.24186820465207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the theory of lossy compression, the rate-distortion (R-D) function $R(D)$
describes how much a data source can be compressed (in bit-rate) at any given
level of fidelity (distortion). Obtaining $R(D)$ for a given data source
establishes the fundamental performance limit for all compression algorithms.
We propose a new method to estimate $R(D)$ from the perspective of optimal
transport. Unlike the classic Blahut--Arimoto algorithm which fixes the support
of the reproduction distribution in advance, our Wasserstein gradient descent
algorithm learns the support of the optimal reproduction distribution by moving
particles. We prove its local convergence and analyze the sample complexity of
our R-D estimator based on a connection to entropic optimal transport.
Experimentally, we obtain comparable or tighter bounds than state-of-the-art
neural network methods on low-rate sources while requiring considerably less
tuning and computation effort. We also highlight a connection to
maximum-likelihood deconvolution and introduce a new class of sources that can
be used as test cases with known solutions to the R-D problem.
Related papers
- Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Flow-based Distributionally Robust Optimization [23.232731771848883]
We present a framework, called $textttFlowDRO$, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets.
We aim to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD) and sample from it.
We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy.
arXiv Detail & Related papers (2023-10-30T03:53:31Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - A Constrained BA Algorithm for Rate-Distortion and Distortion-Rate
Functions [13.570794979535934]
modification of Blahut-Arimoto (BA) algorithm for rate-distortion functions.
modified algorithm directly computes the RD function for a given target distortion.
arXiv Detail & Related papers (2023-05-04T08:41:03Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Neural Estimation of the Rate-Distortion Function With Applications to
Operational Source Coding [25.59334941818991]
A fundamental question in designing lossy data compression schemes is how well one can do in comparison with the rate-distortion function.
We investigate methods to estimate the rate-distortion function on large, real-world data.
We apply the resulting rate-distortion estimator, called NERD, on popular image datasets.
arXiv Detail & Related papers (2022-04-04T16:06:40Z) - Towards Empirical Sandwich Bounds on the Rate-Distortion Function [33.21088274828794]
Rate-distortion (R-D) function, a key quantity in information theory, characterizes the fundamental limit of how much a data source can be compressed.
This paper makes the first attempt at an algorithm for sandwiching the R-D function of a general (not necessarily discrete) source requiring only i.i.d. data samples.
arXiv Detail & Related papers (2021-11-23T21:50:27Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - A Distributed Training Algorithm of Generative Adversarial Networks with
Quantized Gradients [8.202072658184166]
We propose a distributed GANs training algorithm with quantized gradient, dubbed DQGAN, which is the first distributed training method with quantized gradient for GANs.
The new method trains GANs based on a specific single machine algorithm called Optimistic Mirror Descent (OMD) algorithm, and is applicable to any gradient compression method that satisfies a general $delta$-approximate compressor.
Theoretically, we establish the non-asymptotic convergence of DQGAN algorithm to first-order stationary point, which shows that the proposed algorithm can achieve a linear speedup in the
arXiv Detail & Related papers (2020-10-26T06:06:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.