Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization
- URL: http://arxiv.org/abs/2402.09327v2
- Date: Thu, 18 Jul 2024 17:37:59 GMT
- Title: Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization
- Authors: Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel M. Roy,
- Abstract summary: We investigate the interplay between memorization and learning in the context of emphstochastic convex optimization (SCO)
We quantify memorization using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou ( 2020)
We show that, in the $L2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error has CMI bounded below by $Omega (1/varepsilon2)$ and $Omega (1/varepsilon)$, respectively.
- Score: 36.28082578445787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the $L^2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error $\varepsilon$ has CMI bounded below by $\Omega(1/\varepsilon^2)$ and $\Omega(1/\varepsilon)$, respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems.
Related papers
- Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z) - Evaluated CMI Bounds for Meta Learning: Tightness and Expressiveness [14.147617330278662]
We present novel generalization bounds for meta learning in terms of the evaluated CMI (e-CMI)
We show that the e-CMI framework yields a bound that scales as $sqrt mathcal C(mathcal H)/(nhat n) + mathcal C(mathcal F)/n $, where $mathcal C(cdot)$ denotes a complexity measure of the hypothesis class.
arXiv Detail & Related papers (2022-10-12T18:10:59Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$.
In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially.
Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z) - Towards a Unified Information-Theoretic Framework for Generalization [43.23033319683591]
We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm.
We prove that the CMI framework yields the optimal bound on the expected risk of Support Vector Machines (SVMs) for learning halfspaces.
arXiv Detail & Related papers (2021-11-09T17:09:40Z) - Rate-Distortion Analysis of Minimum Excess Risk in Bayesian Learning [15.544041797200045]
Minimum Excess Risk (MER) in Bayesian learning is defined as the difference between the minimum expected loss achievable when learning from data and the minimum expected loss that could be achieved if the underlying parameter $W$ was observed.
We derive information-theoretic bounds on the difference between these upper and lower bounds and show that they can provide order-wise tight rates for MER.
arXiv Detail & Related papers (2021-05-10T08:14:10Z) - Learner-Private Online Convex Optimization [24.204781914017758]
We study how to optimally obfuscate the learner's queries in first-order online convex optimization.
Our results apply to the significantly richer family of general convex functions with full-gradient feedback.
arXiv Detail & Related papers (2021-02-23T23:00:44Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z) - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information.
We provide a theoretical analysis of the properties of CLUB and its variational approximation.
Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.