Tighter CMI-Based Generalization Bounds via Stochastic Projection and Quantization
- URL: http://arxiv.org/abs/2510.23485v1
- Date: Mon, 27 Oct 2025 16:17:09 GMT
- Title: Tighter CMI-Based Generalization Bounds via Stochastic Projection and Quantization
- Authors: Milad Sefidgaran, Kimia Nadjahi, Abdellatif Zaidi,
- Abstract summary: We leverage projection and lossy compression to establish new conditional mutual information (CMI) bounds on the generalization error of statistical learning algorithms.<n>Our bounds yield suitable generalization guarantees of the order of $mathcalO (1/sqrtn)$, where $n$ is the size of the training dataset.<n>We show that for every learning algorithm, there exists an auxiliary algorithm that does not memorize and which yields comparable generalization error for any data distribution.
- Score: 23.435054174825282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we leverage stochastic projection and lossy compression to establish new conditional mutual information (CMI) bounds on the generalization error of statistical learning algorithms. It is shown that these bounds are generally tighter than the existing ones. In particular, we prove that for certain problem instances for which existing MI and CMI bounds were recently shown in Attias et al. [2024] and Livni [2023] to become vacuous or fail to describe the right generalization behavior, our bounds yield suitable generalization guarantees of the order of $\mathcal{O}(1/\sqrt{n})$, where $n$ is the size of the training dataset. Furthermore, we use our bounds to investigate the problem of data "memorization" raised in those works, and which asserts that there are learning problem instances for which any learning algorithm that has good prediction there exist distributions under which the algorithm must "memorize" a big fraction of the training dataset. We show that for every learning algorithm, there exists an auxiliary algorithm that does not memorize and which yields comparable generalization error for any data distribution. In part, this shows that memorization is not necessary for good generalization.
Related papers
- Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD [64.08556301183664]
Information-theoretic (IT) generalization bounds have been used to study the generalization of learning algorithms.<n>This paper derives a more flatness-leveraging IT bound for the flatness-favoring SGD.
arXiv Detail & Related papers (2026-01-04T10:17:50Z) - An Information-Theoretic Approach to Generalization Theory [27.87324770020133]
We analyze information-theoretic bounds that quantify the dependence between a learning algorithm and the training data.
We show that algorithms with a bounded maximal leakage guarantee generalization even with a constant privacy parameter.
arXiv Detail & Related papers (2024-08-20T10:08:21Z) - Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement
Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure.
In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP.
We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z) - Information Theoretic Lower Bounds for Information Theoretic Upper
Bounds [14.268363583731848]
We examine the relationship between the output model and the empirical generalization and the algorithm in the context of convex optimization.
Our study reveals that, for true risk minimization, mutual information is necessary.
Existing information-theoretic generalization bounds fall short in capturing the capabilities of algorithms like SGD and regularized, which have-independent dimension sample complexity.
arXiv Detail & Related papers (2023-02-09T20:42:36Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z) - Tighter Generalization Bounds for Iterative Differentially Private
Learning Algorithms [95.73230376153872]
This paper studies the relationship between generalization and privacy preservation in iterative learning algorithms by two sequential steps.
We prove that $(varepsilon, delta)$-differential privacy implies an on-average generalization bound for multi-Database learning algorithms.
We then investigate how the iterative nature shared by most learning algorithms influence privacy preservation and further generalization.
arXiv Detail & Related papers (2020-07-18T09:12:03Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.