Concentration Bounds for Discrete Distribution Estimation in KL
Divergence
- URL: http://arxiv.org/abs/2302.06869v2
- Date: Tue, 13 Jun 2023 01:19:50 GMT
- Title: Concentration Bounds for Discrete Distribution Estimation in KL
Divergence
- Authors: Cl\'ement L. Canonne and Ziteng Sun and Ananda Theertha Suresh
- Abstract summary: We show that the deviation from mean scales as $sqrtk/n$ when $n ge k$ improves upon the best prior result of $k/n$.
We also establish a matching lower bound that shows that our bounds are tight up to polylogarithmic factors.
- Score: 21.640337031842368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of discrete distribution estimation in KL divergence and
provide concentration bounds for the Laplace estimator. We show that the
deviation from mean scales as $\sqrt{k}/n$ when $n \ge k$, improving upon the
best prior result of $k/n$. We also establish a matching lower bound that shows
that our bounds are tight up to polylogarithmic factors.
Related papers
- Risk Bounds for Mixture Density Estimation on Compact Domains via the $h$-Lifted Kullback--Leibler Divergence [2.8074364079901017]
We introduce the $h$-lifted Kullback--Leibler (KL) divergence as a generalization of the standard KL divergence.
We develop a procedure for the computation of the corresponding maximum $h$-lifted likelihood estimators.
arXiv Detail & Related papers (2024-04-19T02:31:34Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Convergence and concentration properties of constant step-size SGD
through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD)
We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance.
All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z) - Convergence of Random Reshuffling Under The Kurdyka-{\L}ojasiewicz
Inequality [3.960041987749073]
We conduct convergence analysis for the non-descent RR with diminishing step sizes based on the Kurdyka-Lojasiewicz (KL) inequality.
We derive the corresponding rate of convergence depending on the KL exponent and the suitably selected diminishing step sizes.
We also establish similar strong limit-point convergence results for the proximal point method.
arXiv Detail & Related papers (2021-10-10T23:20:04Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z) - KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint
Support [27.165565512841656]
We study the gradient flow for a relaxed approximation to the Kullback-Leibler divergence between a moving source and a fixed target distribution.
This approximation, termed the KALE (KL approximate lower-bound estimator), solves a regularized version of the Fenchel dual problem defining the KL over a restricted class of functions.
arXiv Detail & Related papers (2021-06-16T16:37:43Z) - Linear Optimal Transport Embedding: Provable Wasserstein classification
for certain rigid transformations and perturbations [79.23797234241471]
Discriminating between distributions is an important problem in a number of scientific fields.
The Linear Optimal Transportation (LOT) embeds the space of distributions into an $L2$-space.
We demonstrate the benefits of LOT on a number of distribution classification problems.
arXiv Detail & Related papers (2020-08-20T19:09:33Z) - Relative Deviation Margin Bounds [55.22251993239944]
We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity.
We derive distribution-dependent generalization bounds for unbounded loss functions under the assumption of a finite moment.
arXiv Detail & Related papers (2020-06-26T12:37:17Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.