Scaling Distributed Multi-task Reinforcement Learning with Experience
Sharing
- URL: http://arxiv.org/abs/2307.05834v1
- Date: Tue, 11 Jul 2023 22:58:53 GMT
- Title: Scaling Distributed Multi-task Reinforcement Learning with Experience
Sharing
- Authors: Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang
- Abstract summary: DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents.
We conduct both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks.
We propose an algorithm called DistMT-LSVI, where each agent independently learns $epsilon$-optimal policies for all $M$ tasks.
- Score: 38.883540444516605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, DARPA launched the ShELL program, which aims to explore how
experience sharing can benefit distributed lifelong learning agents in adapting
to new challenges. In this paper, we address this issue by conducting both
theoretical and empirical research on distributed multi-task reinforcement
learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks
without prior knowledge of their identities. We approach the problem by
formulating it as linearly parameterized contextual Markov decision processes
(MDPs), where each task is represented by a context that specifies the
transition dynamics and rewards. To tackle this problem, we propose an
algorithm called DistMT-LSVI. First, the agents identify the tasks, and then
they exchange information through a central server to derive $\epsilon$-optimal
policies for the tasks. Our research demonstrates that to achieve
$\epsilon$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI
needs to run a total number of episodes that is at most
$\tilde{\mathcal{O}}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where
$c_{\rm sep}>0$ is a constant representing task separability, $H$ is the
horizon of each episode, and $d$ is the feature dimension of the dynamics and
rewards. Notably, DistMT-LSVI improves the sample complexity of non-distributed
settings by a factor of $1/N$, as each agent independently learns
$\epsilon$-optimal policies for all $M$ tasks using
$\tilde{\mathcal{O}}(d^3H^6M\epsilon^{-2})$ episodes. Additionally, we provide
numerical experiments conducted on OpenAI Gym Atari environments that validate
our theoretical findings.
Related papers
- Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task
Adaptation [45.90925587972781]
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions.
Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks.
MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios.
arXiv Detail & Related papers (2023-10-04T14:11:12Z) - Improved Active Multi-Task Representation Learning via Lasso [44.607652031235716]
In this paper, we show the dominance of the L1-regularized-relevance-based ($nu1$) strategy by giving a lower bound for the $nu2$-based strategy.
We also characterize the potential of our $nu1$-based strategy in sample-cost-sensitive settings.
arXiv Detail & Related papers (2023-06-05T03:08:29Z) - Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems.
We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z) - Reward-Mixing MDPs with a Few Latent Contexts are Learnable [75.17357040707347]
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs)
Our goal is to learn a near-optimal policy that nearly maximizes the $H$ time-step cumulative rewards in such a model.
arXiv Detail & Related papers (2022-10-05T22:52:00Z) - Joint Representation Training in Sequential Tasks with Shared Structure [40.1056491921582]
We introduce the Shared-MatrixRL algorithm for the setting of Multitask MatrixRL.
We show the regret on the the $P$ tasks can be improved from $O(PHdsqrtNH)$ to $O((HdsqrtrP + HPsqrtrd)sqrtNH)$ over $N$ episodes of horizon $H$.
arXiv Detail & Related papers (2022-06-24T18:10:00Z) - Reward-Free Model-Based Reinforcement Learning with Linear Function
Approximation [92.99933928528797]
We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs)
In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy.
We show that to obtain an $epsilon$-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most $tilde O(H4d(H + d)epsilon-2)$ episodes.
arXiv Detail & Related papers (2021-10-12T23:03:58Z) - On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP)
We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z) - Multitask Online Mirror Descent [35.93027027759005]
We analyze MT-OMD, a multitask generalization of Online Mirror Descent (OMD) which operates by sharing updates between tasks.
Our extensions of Online Gradient Descent and Exponentiated Gradient, two important instances of OMD, are shown to enjoy closed-form updates.
arXiv Detail & Related papers (2021-06-04T10:14:57Z) - On the Theory of Transfer Learning: The Importance of Task Diversity [114.656572506859]
We consider $t+1$ tasks parameterized by functions of the form $f_j circ h$ in a general function class $mathcalF circ mathcalH$.
We show that for diverse training tasks the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(mathcalH) + t C(mathcalF)$.
arXiv Detail & Related papers (2020-06-20T20:33:59Z) - Task-agnostic Exploration in Reinforcement Learning [35.403304641170386]
We present an efficient task-agnostic reinforcement learning algorithm, textscUCBZero.
It finds $epsilon$-optimal policies for $N$ arbitrary tasks after at most $tilde O(log(N)H5SA/epsilon2)$ exploration episodes.
We also provide an $Omega(log (N)H2SA/epsilon2)$ lower bound, showing that the $log$ dependency on $N$ is unavoidable.
arXiv Detail & Related papers (2020-06-16T20:23:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.