Few-Shot Learning via Learning the Representation, Provably
- URL: http://arxiv.org/abs/2002.09434v2
- Date: Tue, 30 Mar 2021 04:06:04 GMT
- Title: Few-Shot Learning via Learning the Representation, Provably
- Authors: Simon S. Du, Wei Hu, Sham M. Kakade, Jason D. Lee, Qi Lei
- Abstract summary: This paper studies few-shot learning via representation learning.
One uses $T$ source tasks with $n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task.
- Score: 115.7367053639605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies few-shot learning via representation learning, where one
uses $T$ source tasks with $n_1$ data per task to learn a representation in
order to reduce the sample complexity of a target task for which there is only
$n_2 (\ll n_1)$ data. Specifically, we focus on the setting where there exists
a good \emph{common representation} between source and target, and our goal is
to understand how much of a sample size reduction is possible. First, we study
the setting where this common representation is low-dimensional and provide a
fast rate of $O\left(\frac{\mathcal{C}\left(\Phi\right)}{n_1T} +
\frac{k}{n_2}\right)$; here, $\Phi$ is the representation function class,
$\mathcal{C}\left(\Phi\right)$ is its complexity measure, and $k$ is the
dimension of the representation. When specialized to linear representation
functions, this rate becomes $O\left(\frac{dk}{n_1T} + \frac{k}{n_2}\right)$
where $d (\gg k)$ is the ambient input dimension, which is a substantial
improvement over the rate without using representation learning, i.e. over the
rate of $O\left(\frac{d}{n_2}\right)$. This result bypasses the
$\Omega(\frac{1}{T})$ barrier under the i.i.d. task assumption, and can capture
the desired property that all $n_1T$ samples from source tasks can be
\emph{pooled} together for representation learning. Next, we consider the
setting where the common representation may be high-dimensional but is
capacity-constrained (say in norm); here, we again demonstrate the advantage of
representation learning in both high-dimensional linear regression and neural
network learning. Our results demonstrate representation learning can fully
utilize all $n_1T$ samples from source tasks.
Related papers
- Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis [45.05072391903122]
Information exponent plays important role in predicting sample complexity of online gradient descent.
For multi-index models, focusing solely on the lowest degree can miss key structural details.
We show that by considering both second- and higher-order terms, we can first learn the relevant space via the second-order terms.
arXiv Detail & Related papers (2024-10-13T00:14:08Z) - Metalearning with Very Few Samples Per Task [19.78398372660794]
We consider a binary classification setting where tasks are related by a shared representation.
Here, the amount of data is measured in terms of the number of tasks $t$ that we need to see and the number of samples $n$ per task.
Our work also yields a characterization of distribution-free multitask learning and reductions between meta and multitask learning.
arXiv Detail & Related papers (2023-12-21T16:06:44Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff [12.351756386062291]
We formalize a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps.
For large depths, almost all hidden representations are approximately $R(0)(f)$-dimensional, and almost all weight matrices $W_ell$ have $R(0)(f)$ singular values close to 1.
Interestingly, the use of large learning rates is required to guarantee an order $O(L)$ NTK which in turns guarantees infinite depth convergence of the representations of almost all layers.
arXiv Detail & Related papers (2023-05-30T13:06:26Z) - Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems.
We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z) - Neural Networks can Learn Representations with Gradient Descent [68.95262816363288]
In specific regimes, neural networks trained by gradient descent behave like kernel methods.
In practice, it is known that neural networks strongly outperform their associated kernels.
arXiv Detail & Related papers (2022-06-30T09:24:02Z) - High-dimensional Asymptotics of Feature Learning: How One Gradient Step
Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network.
Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z) - On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP)
We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.