Information Processing Equalities and the Information-Risk Bridge
- URL: http://arxiv.org/abs/2207.11987v2
- Date: Fri, 8 Sep 2023 11:48:47 GMT
- Title: Information Processing Equalities and the Information-Risk Bridge
- Authors: Robert C. Williamson and Zac Cranko
- Abstract summary: We introduce two new classes of measures of information for statistical experiments.
We derive a simple geometrical relationship between measures of information and the Bayes risk of a statistical decision problem.
- Score: 10.451984251615512
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce two new classes of measures of information for statistical
experiments which generalise and subsume $\phi$-divergences, integral
probability metrics, $\mathfrak{N}$-distances (MMD), and $(f,\Gamma)$
divergences between two or more distributions. This enables us to derive a
simple geometrical relationship between measures of information and the Bayes
risk of a statistical decision problem, thus extending the variational
$\phi$-divergence representation to multiple distributions in an entirely
symmetric manner. The new families of divergence are closed under the action of
Markov operators which yields an information processing equality which is a
refinement and generalisation of the classical data processing inequality. This
equality gives insight into the significance of the choice of the hypothesis
class in classical risk minimization.
Related papers
- Mutual Information Multinomial Estimation [53.58005108981247]
Estimating mutual information (MI) is a fundamental yet challenging task in data science and machine learning.
Our main discovery is that a preliminary estimate of the data distribution can dramatically help estimate.
Experiments on diverse tasks including non-Gaussian synthetic problems with known ground-truth and real-world applications demonstrate the advantages of our method.
arXiv Detail & Related papers (2024-08-18T06:27:30Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Beyond Normal: On the Evaluation of Mutual Information Estimators [52.85079110699378]
We show how to construct a diverse family of distributions with known ground-truth mutual information.
We provide guidelines for practitioners on how to select appropriate estimator adapted to the difficulty of problem considered.
arXiv Detail & Related papers (2023-06-19T17:26:34Z) - A unified framework for information-theoretic generalization bounds [8.04975023021212]
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_psi_p$ Orlicz spaces.
arXiv Detail & Related papers (2023-05-18T15:36:20Z) - Lower Bounds on the Bayesian Risk via Information Measures [17.698319441265223]
We show that one can lower bound the risk with any information measure by upper bounding its dual via Markov's inequality.
The behaviour of the lower bound in the number of samples is influenced by the choice of the information measure.
If the observations are subject to privatisation, stronger impossibility results can be obtained via Strong Data-Processing Inequalities.
arXiv Detail & Related papers (2023-03-22T12:09:12Z) - Function-space regularized R\'enyi divergences [6.221019624345409]
We propose a new family of regularized R'enyi divergences parametrized by a variational function space.
We prove several properties of these new divergences, showing that they interpolate between the classical R'enyi divergences and IPMs.
We show that the proposed regularized R'enyi divergences inherit features from IPMs such as the ability to compare distributions that are not absolutely continuous.
arXiv Detail & Related papers (2022-10-10T19:18:04Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Learning to Transfer with von Neumann Conditional Divergence [14.926485055255942]
We introduce the recently proposed von Neumann conditional divergence to improve the transferability across multiple domains.
We design novel learning objectives assuming those source tasks are observed either simultaneously or sequentially.
In both scenarios, we obtain favorable performance against state-of-the-art methods in terms of smaller generalization error on new tasks and less catastrophic forgetting on source tasks (in the sequential setup)
arXiv Detail & Related papers (2021-08-07T22:18:23Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - $k$-Variance: A Clustered Notion of Variance [23.57925128327]
We introduce $k$-variance, a generalization of variance built on the machinery of random bipartite matchings.
We provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets.
arXiv Detail & Related papers (2020-12-13T04:25:32Z) - Saliency-based Weighted Multi-label Linear Discriminant Analysis [101.12909759844946]
We propose a new variant of Linear Discriminant Analysis (LDA) to solve multi-label classification tasks.
The proposed method is based on a probabilistic model for defining the weights of individual samples.
The Saliency-based weighted Multi-label LDA approach is shown to lead to performance improvements in various multi-label classification problems.
arXiv Detail & Related papers (2020-04-08T19:40:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.