On the Power of Multitask Representation Learning in Linear MDP
- URL: http://arxiv.org/abs/2106.08053v1
- Date: Tue, 15 Jun 2021 11:21:06 GMT
- Title: On the Power of Multitask Representation Learning in Linear MDP
- Authors: Rui Lu, Gao Huang, Simon S. Du
- Abstract summary: This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP)
We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
- Score: 61.58929164172968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While multitask representation learning has become a popular approach in
reinforcement learning (RL), theoretical understanding of why and when it works
remains limited. This paper presents analyses for the statistical benefit of
multitask representation learning in linear Markov Decision Process (MDP) under
a generative model. In this paper, we consider an agent to learn a
representation function $\phi$ out of a function class $\Phi$ from $T$ source
tasks with $N$ data per task, and then use the learned $\hat{\phi}$ to reduce
the required number of sample for a new task. We first discover a
\emph{Least-Activated-Feature-Abundance} (LAFA) criterion, denoted as $\kappa$,
with which we prove that a straightforward least-square algorithm learns a
policy which is $\tilde{O}(H^2\sqrt{\frac{\mathcal{C}(\Phi)^2 \kappa
d}{NT}+\frac{\kappa d}{n}})$ sub-optimal. Here $H$ is the planning horizon,
$\mathcal{C}(\Phi)$ is $\Phi$'s complexity measure, $d$ is the dimension of the
representation (usually $d\ll \mathcal{C}(\Phi)$) and $n$ is the number of
samples for the new task. Thus the required $n$ is $O(\kappa d H^4)$ for the
sub-optimality to be close to zero, which is much smaller than
$O(\mathcal{C}(\Phi)^2\kappa d H^4)$ in the setting without multitask
representation learning, whose sub-optimality gap is
$\tilde{O}(H^2\sqrt{\frac{\kappa \mathcal{C}(\Phi)^2d}{n}})$. This
theoretically explains the power of multitask representation learning in
reducing sample complexity. Further, we note that to ensure high sample
efficiency, the LAFA criterion $\kappa$ should be small. In fact, $\kappa$
varies widely in magnitude depending on the different sampling distribution for
new task. This indicates adaptive sampling technique is important to make
$\kappa$ solely depend on $d$. Finally, we provide empirical results of a noisy
grid-world environment to corroborate our theoretical findings.
Related papers
- Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples [24.45016514352055]
We study the sample-complexity of learning $T+1$ functions $f_star(t) circ g_star$ from a function class $mathcal F times mathcal G$.
We show that as the number of tasks $T$ increases, both the sample requirement and risk bound converge to that of $r$-dimensional regression.
arXiv Detail & Related papers (2024-10-15T03:20:19Z) - Metalearning with Very Few Samples Per Task [19.78398372660794]
We consider a binary classification setting where tasks are related by a shared representation.
Here, the amount of data is measured in terms of the number of tasks $t$ that we need to see and the number of samples $n$ per task.
Our work also yields a characterization of distribution-free multitask learning and reductions between meta and multitask learning.
arXiv Detail & Related papers (2023-12-21T16:06:44Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Adversarial Online Multi-Task Reinforcement Learning [12.421997449847153]
We consider the adversarial online multi-task reinforcement learning setting.
In each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models.
The learner's objective is to generalize its regret with respect to the optimal policy for each task.
arXiv Detail & Related papers (2023-01-11T02:18:26Z) - Reward-Mixing MDPs with a Few Latent Contexts are Learnable [75.17357040707347]
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs)
Our goal is to learn a near-optimal policy that nearly maximizes the $H$ time-step cumulative rewards in such a model.
arXiv Detail & Related papers (2022-10-05T22:52:00Z) - Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs
with a Generative Model [3.749193647980305]
This paper considers a Markov decision process (MDP) that admits a set of state-action features.
We show that a model-based approach (resp.$$Q-learning) provably learns an $varepsilon$-optimal policy with high probability.
arXiv Detail & Related papers (2021-05-28T17:49:39Z) - On the Theory of Transfer Learning: The Importance of Task Diversity [114.656572506859]
We consider $t+1$ tasks parameterized by functions of the form $f_j circ h$ in a general function class $mathcalF circ mathcalH$.
We show that for diverse training tasks the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(mathcalH) + t C(mathcalF)$.
arXiv Detail & Related papers (2020-06-20T20:33:59Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Few-Shot Learning via Learning the Representation, Provably [115.7367053639605]
This paper studies few-shot learning via representation learning.
One uses $T$ source tasks with $n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task.
arXiv Detail & Related papers (2020-02-21T17:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.