Split-kl and PAC-Bayes-split-kl Inequalities
- URL: http://arxiv.org/abs/2206.00706v1
- Date: Wed, 1 Jun 2022 18:42:02 GMT
- Title: Split-kl and PAC-Bayes-split-kl Inequalities
- Authors: Yi-Shan Wu and Yevgeny Seldin
- Abstract summary: We name a split-kl inequality, which combines the power of the kl inequality with the ability to exploit low variance.
For Bernoulli random variables the kl inequality is tighter than the Empirical Bernstein, for random variables taking values inside a bounded interval the Empirical Bernstein inequality is tighter than the kl.
We discuss an application of the split-kl inequality to bounding excess losses.
- Score: 15.63537071742102
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We present a new concentration of measure inequality for sums of independent
bounded random variables, which we name a split-kl inequality. The inequality
combines the combinatorial power of the kl inequality with ability to exploit
low variance. While for Bernoulli random variables the kl inequality is tighter
than the Empirical Bernstein, for random variables taking values inside a
bounded interval and having low variance the Empirical Bernstein inequality is
tighter than the kl. The proposed split-kl inequality yields the best of both
worlds. We discuss an application of the split-kl inequality to bounding excess
losses. We also derive a PAC-Bayes-split-kl inequality and use a synthetic
example and several UCI datasets to compare it with the PAC-Bayes-kl, PAC-Bayes
Empirical Bernstein, PAC-Bayes Unexpected Bernstein, and PAC-Bayes Empirical
Bennett inequalities.
Related papers
- Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities [91.12425544503395]
Variational inequalities are used in various applications ranging from equilibrium search to adversarial learning.
Most distributed approaches have a bottleneck - the cost of communications.
The three main techniques to reduce the total number of communication rounds and the cost of one such round are the similarity of local functions, compression of transmitted information, and local updates.
The methods presented in this paper have the best theoretical guarantees of communication complexity and are significantly ahead of other methods for distributed variational inequalities.
arXiv Detail & Related papers (2023-02-15T12:11:27Z) - Approximation of optimization problems with constraints through kernel
Sum-Of-Squares [77.27820145069515]
We show that pointwise inequalities are turned into equalities within a class of nonnegative kSoS functions.
We also show that focusing on pointwise equality constraints enables the use of scattering inequalities to mitigate the curse of dimensionality in sampling the constraints.
arXiv Detail & Related papers (2023-01-16T10:30:04Z) - On change of measure inequalities for $f$-divergences [5.799808780731661]
Strategy relies on combining the Legendre transform of $f$-divergences and the Young-Fenchel inequality.
We derive new PAC-Bayesian generalisation bounds with a complexity involving $f$-divergences.
We instantiate our results for the most popular $f$-divergences.
arXiv Detail & Related papers (2022-02-11T11:53:28Z) - Some Hoeffding- and Bernstein-type Concentration Inequalities [47.24550702683417]
We prove concentration inequalities for functions of independent random variables under sub-gaussian and sub-exponential conditions.
The utility of the inequalities is demonstrated by an extension of the now classical method of Rademacher complexities to Lipschitz function classes and unbounded sub-exponential distribution.
arXiv Detail & Related papers (2021-02-11T23:09:13Z) - A Unified Joint Maximum Mean Discrepancy for Domain Adaptation [73.44809425486767]
This paper theoretically derives a unified form of JMMD that is easy to optimize.
From the revealed unified JMMD, we illustrate that JMMD degrades the feature-label dependence that benefits to classification.
We propose a novel MMD matrix to promote the dependence, and devise a novel label kernel that is robust to label distribution shift.
arXiv Detail & Related papers (2021-01-25T09:46:14Z) - Uncertainty Relations of Variances in View of the Weak Value [1.1699566743796068]
We show that there is another inequality which underlies the Schr"odinger inequality in the same sense.
The decomposition of the Schr"odinger inequality is examined more closely to analyze its structure and the minimal uncertainty states.
arXiv Detail & Related papers (2020-08-07T11:45:43Z) - Wigner and Bell Inequalities relationships and Kolmogorov's Axioms [0.0]
We show that Bell's inequality violation of arise from the fact that the condition imposed upon the development of inequality is not respected when it is applied in the idealized experiment.
We also show that only such violation is possible if Wigner's inequality, in a certain arrangement, is violated, and that both violations are related to the violation of one of Kolmogorov's axioms.
arXiv Detail & Related papers (2020-06-15T01:01:08Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Novel Change of Measure Inequalities with Applications to PAC-Bayesian
Bounds and Monte Carlo Estimation [29.919144859026016]
We show how variational representation for $f$-divergences leads to novel change of measure inequalities.
We also present a multiplicative change of measure inequality for $alpha$-divergences and a generalized version of Hammersley-Chapman-Robbins inequality.
arXiv Detail & Related papers (2020-02-25T05:36:22Z) - Concentration inequality using unconfirmed knowledge [2.538209532048867]
We give a concentration inequality based on the premise that random variables take values within a particular region.
Our inequality outperforms other well-known inequalities.
arXiv Detail & Related papers (2020-02-11T13:02:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.