Generalization Analysis of Machine Learning Algorithms via the
Worst-Case Data-Generating Probability Measure
- URL: http://arxiv.org/abs/2312.12236v1
- Date: Tue, 19 Dec 2023 15:20:27 GMT
- Title: Generalization Analysis of Machine Learning Algorithms via the
Worst-Case Data-Generating Probability Measure
- Authors: Xinying Zou, Samir M. Perlaza, I\~naki Esnaola, Eitan Altman
- Abstract summary: Worst-case probability measure over the data is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms.
Fundamental generalization metrics, such as the sensitivity of the expected loss, the sensitivity of empirical risk, and the generalization gap are shown to have closed-form expressions.
A novel parallel is established between the worst-case data-generating probability measure and the Gibbs algorithm.
- Score: 1.773764539873123
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, the worst-case probability measure over the data is introduced
as a tool for characterizing the generalization capabilities of machine
learning algorithms. More specifically, the worst-case probability measure is a
Gibbs probability measure and the unique solution to the maximization of the
expected loss under a relative entropy constraint with respect to a reference
probability measure. Fundamental generalization metrics, such as the
sensitivity of the expected loss, the sensitivity of the empirical risk, and
the generalization gap are shown to have closed-form expressions involving the
worst-case data-generating probability measure. Existing results for the Gibbs
algorithm, such as characterizing the generalization gap as a sum of mutual
information and lautum information, up to a constant factor, are recovered. A
novel parallel is established between the worst-case data-generating
probability measure and the Gibbs algorithm. Specifically, the Gibbs
probability measure is identified as a fundamental commonality of the model
space and the data space for machine learning algorithms.
Related papers
- The Generalization Error of Machine Learning Algorithms [0.0]
Method of gaps is a technique for deriving closed-form expressions in terms of information measures for the generalization error of machine learning algorithms.
All existing exact expressions for the generalization error of machine learning algorithms can be obtained with the proposed method.
arXiv Detail & Related papers (2024-11-18T20:05:51Z) - A unified framework for information-theoretic generalization bounds [8.04975023021212]
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_psi_p$ Orlicz spaces.
arXiv Detail & Related papers (2023-05-18T15:36:20Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - Robust learning of data anomalies with analytically-solvable entropic
outlier sparsification [0.0]
Outlier Sparsification (EOS) is proposed as a robust computational strategy for the detection of data anomalies.
The performance of EOS is compared to a range of commonly-used tools on synthetic problems and on partially-mislabeled supervised classification problems from biomedicine.
arXiv Detail & Related papers (2021-12-22T10:13:29Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Characterizing the Generalization Error of Gibbs Algorithm with
Symmetrized KL information [18.92529916180208]
Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory.
Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm.
arXiv Detail & Related papers (2021-07-28T22:20:34Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Probabilistic Simplex Component Analysis [66.30587591100566]
PRISM is a probabilistic simplex component analysis approach to identifying the vertices of a data-circumscribing simplex from data.
The problem has a rich variety of applications, the most notable being hyperspectral unmixing in remote sensing and non-negative matrix factorization in machine learning.
arXiv Detail & Related papers (2021-03-18T05:39:00Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.