Related papers: On the Theory of Transfer Learning: The Importance of Task Diversity

On the Theory of Transfer Learning: The Importance of Task Diversity

URL: http://arxiv.org/abs/2006.11650v2
Date: Thu, 22 Oct 2020 17:19:02 GMT
Title: On the Theory of Transfer Learning: The Importance of Task Diversity
Authors: Nilesh Tripuraneni, Michael I. Jordan, Chi Jin
Abstract summary: We consider $t+1$ tasks parameterized by functions of the form $f_j circ h$ in a general function class $mathcalF circ mathcalH$. We show that for diverse training tasks the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(mathcalH) + t C(mathcalF)$.
Score: 114.656572506859
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We provide new statistical guarantees for transfer learning via representation learning--when transfer is achieved by learning a feature representation shared across different tasks. This enables learning on new tasks using far less data than is required to learn them in isolation. Formally, we consider $t+1$ tasks parameterized by functions of the form $f_j \circ h$ in a general function class $\mathcal{F} \circ \mathcal{H}$, where each $f_j$ is a task-specific function in $\mathcal{F}$ and $h$ is the shared representation in $\mathcal{H}$. Letting $C(\cdot)$ denote the complexity measure of the function class, we show that for diverse training tasks (1) the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(\mathcal{H}) + t C(\mathcal{F})$, despite no explicit access to a signal from the feature representation and (2) with an accurate estimate of the representation, the sample complexity needed to learn a new task scales only with $C(\mathcal{F})$. Our results depend upon a new general notion of task diversity--applicable to models with general tasks, features, and losses--as well as a novel chain rule for Gaussian complexities. Finally, we exhibit the utility of our general framework in several models of importance in the literature.

Related papers

Learning Compositional Functions with Transformers from Easy-to-Hard Data [63.96562216704653]
We study the learnability of the $k$-fold composition task, which requires computing an interleaved composition of $k$ input permutations and $k$ hidden permutations.<n>We show that this function class can be efficiently learned, with runtime and sample in $k$, by gradient descent on an $O(log k)$-depth transformer.
arXiv Detail & Related papers (2025-05-29T17:22:00Z)
Task Generalization With AutoRegressive Compositional Structure: Can Learning From $\d$ Tasks Generalize to $\d^{T}$ Tasks? [23.597170816867077]
Large language models (LLMs) exhibit remarkable task generalization, solving tasks they were never explicitly trained on with only a few demonstrations. This raises a fundamental question: When can learning from a small set of tasks generalize to a large task family? In this paper, we investigate task generalization through the lens of AutoRegressive Compositional (ARC) structure, where each task is a composition of $T$ operations, and each operation is among a finite family of $d$ subtasks.
arXiv Detail & Related papers (2025-02-13T06:08:01Z)
Metalearning with Very Few Samples Per Task [19.78398372660794]
We consider a binary classification setting where tasks are related by a shared representation. Here, the amount of data is measured in terms of the number of tasks $t$ that we need to see and the number of samples $n$ per task. Our work also yields a characterization of distribution-free multitask learning and reductions between meta and multitask learning.
arXiv Detail & Related papers (2023-12-21T16:06:44Z)
Active Representation Learning for General Task Space with Applications in Robotics [44.36398212117328]
We propose an algorithmic framework for textitactive representation learning, where the learner optimally chooses which source tasks to sample from. We provide several instantiations under this framework, from bilinear and feature-based nonlinear to general nonlinear cases. Our algorithms outperform baselines by $20%-70%$ on average.
arXiv Detail & Related papers (2023-06-15T08:27:50Z)
Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z)
On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor) We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z)
Meta Learning for High-dimensional Ising Model Selection Using $\ell_1$-regularized Logistic Regression [28.776950569604026]
We consider the meta learning problem for estimating the graphs associated with high-dimensional Ising models. Our goal is to use the information learned from the auxiliary tasks in the learning of the novel task to reduce its sufficient sample complexity.
arXiv Detail & Related papers (2022-08-19T20:28:39Z)
On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP) We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z)
Improving Robustness and Generality of NLP Models Using Disentangled Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$. We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z)
Few-Shot Learning via Learning the Representation, Provably [115.7367053639605]
This paper studies few-shot learning via representation learning. One uses $T$ source tasks with $n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task.
arXiv Detail & Related papers (2020-02-21T17:30:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.