On the Theory of Transfer Learning: The Importance of Task Diversity
- URL: http://arxiv.org/abs/2006.11650v2
- Date: Thu, 22 Oct 2020 17:19:02 GMT
- Title: On the Theory of Transfer Learning: The Importance of Task Diversity
- Authors: Nilesh Tripuraneni, Michael I. Jordan, Chi Jin
- Abstract summary: We consider $t+1$ tasks parameterized by functions of the form $f_j circ h$ in a general function class $mathcalF circ mathcalH$.
We show that for diverse training tasks the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(mathcalH) + t C(mathcalF)$.
- Score: 114.656572506859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide new statistical guarantees for transfer learning via
representation learning--when transfer is achieved by learning a feature
representation shared across different tasks. This enables learning on new
tasks using far less data than is required to learn them in isolation.
Formally, we consider $t+1$ tasks parameterized by functions of the form $f_j
\circ h$ in a general function class $\mathcal{F} \circ \mathcal{H}$, where
each $f_j$ is a task-specific function in $\mathcal{F}$ and $h$ is the shared
representation in $\mathcal{H}$. Letting $C(\cdot)$ denote the complexity
measure of the function class, we show that for diverse training tasks (1) the
sample complexity needed to learn the shared representation across the first
$t$ training tasks scales as $C(\mathcal{H}) + t C(\mathcal{F})$, despite no
explicit access to a signal from the feature representation and (2) with an
accurate estimate of the representation, the sample complexity needed to learn
a new task scales only with $C(\mathcal{F})$. Our results depend upon a new
general notion of task diversity--applicable to models with general tasks,
features, and losses--as well as a novel chain rule for Gaussian complexities.
Finally, we exhibit the utility of our general framework in several models of
importance in the literature.
Related papers
- Metalearning with Very Few Samples Per Task [19.78398372660794]
We consider a binary classification setting where tasks are related by a shared representation.
Here, the amount of data is measured in terms of the number of tasks $t$ that we need to see and the number of samples $n$ per task.
Our work also yields a characterization of distribution-free multitask learning and reductions between meta and multitask learning.
arXiv Detail & Related papers (2023-12-21T16:06:44Z) - Active Representation Learning for General Task Space with Applications
in Robotics [44.36398212117328]
We propose an algorithmic framework for textitactive representation learning, where the learner optimally chooses which source tasks to sample from.
We provide several instantiations under this framework, from bilinear and feature-based nonlinear to general nonlinear cases.
Our algorithms outperform baselines by $20%-70%$ on average.
arXiv Detail & Related papers (2023-06-15T08:27:50Z) - Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems.
We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z) - On the Sample Complexity of Representation Learning in Multi-task
Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems.
Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor)
We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z) - Meta Learning for High-dimensional Ising Model Selection Using
$\ell_1$-regularized Logistic Regression [28.776950569604026]
We consider the meta learning problem for estimating the graphs associated with high-dimensional Ising models.
Our goal is to use the information learned from the auxiliary tasks in the learning of the novel task to reduce its sufficient sample complexity.
arXiv Detail & Related papers (2022-08-19T20:28:39Z) - On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP)
We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - Few-Shot Learning via Learning the Representation, Provably [115.7367053639605]
This paper studies few-shot learning via representation learning.
One uses $T$ source tasks with $n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task.
arXiv Detail & Related papers (2020-02-21T17:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.