Evaluated CMI Bounds for Meta Learning: Tightness and Expressiveness
- URL: http://arxiv.org/abs/2210.06511v1
- Date: Wed, 12 Oct 2022 18:10:59 GMT
- Title: Evaluated CMI Bounds for Meta Learning: Tightness and Expressiveness
- Authors: Fredrik Hellstr\"om and Giuseppe Durisi
- Abstract summary: We present novel generalization bounds for meta learning in terms of the evaluated CMI (e-CMI)
We show that the e-CMI framework yields a bound that scales as $sqrt mathcal C(mathcal H)/(nhat n) + mathcal C(mathcal F)/n $, where $mathcal C(cdot)$ denotes a complexity measure of the hypothesis class.
- Score: 14.147617330278662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has established that the conditional mutual information (CMI)
framework of Steinke and Zakynthinou (2020) is expressive enough to capture
generalization guarantees in terms of algorithmic stability, VC dimension, and
related complexity measures for conventional learning (Harutyunyan et al.,
2021, Haghifam et al., 2021). Hence, it provides a unified method for
establishing generalization bounds. In meta learning, there has so far been a
divide between information-theoretic results and results from classical
learning theory. In this work, we take a first step toward bridging this
divide. Specifically, we present novel generalization bounds for meta learning
in terms of the evaluated CMI (e-CMI). To demonstrate the expressiveness of the
e-CMI framework, we apply our bounds to a representation learning setting, with
$n$ samples from $\hat n$ tasks parameterized by functions of the form $f_i
\circ h$. Here, each $f_i \in \mathcal F$ is a task-specific function, and $h
\in \mathcal H$ is the shared representation. For this setup, we show that the
e-CMI framework yields a bound that scales as $\sqrt{ \mathcal C(\mathcal
H)/(n\hat n) + \mathcal C(\mathcal F)/n} $, where $\mathcal C(\cdot)$ denotes a
complexity measure of the hypothesis class. This scaling behavior coincides
with the one reported in Tripuraneni et al. (2020) using Gaussian complexity.
Related papers
- A Theory of Interpretable Approximations [61.90216959710842]
We study the idea of approximating a target concept $c$ by a small aggregation of concepts from some base class $mathcalH$.
For any given pair of $mathcalH$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated by $mathcalH$ with arbitrary accuracy.
We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-
arXiv Detail & Related papers (2024-06-15T06:43:45Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization [36.28082578445787]
We investigate the interplay between memorization and learning in the context of emphstochastic convex optimization (SCO)
We quantify memorization using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou ( 2020)
We show that, in the $L2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error has CMI bounded below by $Omega (1/varepsilon2)$ and $Omega (1/varepsilon)$, respectively.
arXiv Detail & Related papers (2024-02-14T17:17:30Z) - Convergence of a Normal Map-based Prox-SGD Method under the KL
Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z) - HappyMap: A Generalized Multi-calibration Method [23.086009024383024]
Multi-calibration is a powerful and evolving concept originating in the field of algorithmic fairness.
In this work, we view the term $(f(x)-y)$ as just one specific mapping, and explore the power of an enriched class of mappings.
We propose textitHappyMap, a generalization of multi-calibration, which yields a wide range of new applications.
arXiv Detail & Related papers (2023-03-08T05:05:01Z) - Detection-Recovery Gap for Planted Dense Cycles [72.4451045270967]
We consider a model where a dense cycle with expected bandwidth $n tau$ and edge density $p$ is planted in an ErdHos-R'enyi graph $G(n,q)$.
We characterize the computational thresholds for the associated detection and recovery problems for the class of low-degree algorithms.
arXiv Detail & Related papers (2023-02-13T22:51:07Z) - Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems.
We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z) - Towards a Unified Information-Theoretic Framework for Generalization [43.23033319683591]
We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm.
We prove that the CMI framework yields the optimal bound on the expected risk of Support Vector Machines (SVMs) for learning halfspaces.
arXiv Detail & Related papers (2021-11-09T17:09:40Z) - On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP)
We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z) - On the Theory of Transfer Learning: The Importance of Task Diversity [114.656572506859]
We consider $t+1$ tasks parameterized by functions of the form $f_j circ h$ in a general function class $mathcalF circ mathcalH$.
We show that for diverse training tasks the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(mathcalH) + t C(mathcalF)$.
arXiv Detail & Related papers (2020-06-20T20:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.