Provable Benefits of Multi-task RL under Non-Markovian Decision Making
Processes
- URL: http://arxiv.org/abs/2310.13550v1
- Date: Fri, 20 Oct 2023 14:50:28 GMT
- Title: Provable Benefits of Multi-task RL under Non-Markovian Decision Making
Processes
- Authors: Ruiquan Huang, Yuan Cheng, Jing Yang, Vincent Tan, Yingbin Liang
- Abstract summary: In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures has been shown to yield significant benefits to the sample efficiency compared to single-task RL.
We investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs)
We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSR
- Score: 56.714690083118406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-task reinforcement learning (RL) under Markov decision processes
(MDPs), the presence of shared latent structures among multiple MDPs has been
shown to yield significant benefits to the sample efficiency compared to
single-task RL. In this paper, we investigate whether such a benefit can extend
to more general sequential decision making problems, such as partially
observable MDPs (POMDPs) and more general predictive state representations
(PSRs). The main challenge here is that the large and complex model space makes
it hard to identify what types of common latent structure of multi-task PSRs
can reduce the model complexity and improve sample efficiency. To this end, we
posit a joint model class for tasks and use the notion of $\eta$-bracketing
number to quantify its complexity; this number also serves as a general metric
to capture the similarity of tasks and thus determines the benefit of
multi-task over single-task RL. We first study upstream multi-task learning
over PSRs, in which all tasks share the same observation and action spaces. We
propose a provably efficient algorithm UMT-PSR for finding near-optimal
policies for all PSRs, and demonstrate that the advantage of multi-task
learning manifests if the joint model class of PSRs has a smaller
$\eta$-bracketing number compared to that of individual single-task learning.
We also provide several example multi-task PSRs with small $\eta$-bracketing
numbers, which reap the benefits of multi-task learning. We further investigate
downstream learning, in which the agent needs to learn a new target task that
shares some commonalities with the upstream tasks via a similarity constraint.
By exploiting the learned PSRs from the upstream, we develop a sample-efficient
algorithm that provably finds a near-optimal policy.
Related papers
- Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping [16.5526277899717]
This study aims to design a multi-agent cooperative algorithm with logic reward shaping.
Experiments have been conducted on various types of tasks in the Minecraft-like environment.
arXiv Detail & Related papers (2024-11-02T09:03:23Z) - Learning Representation for Multitask learning through Self Supervised Auxiliary learning [3.236198583140341]
In the hard parameter sharing approach, an encoder shared through multiple tasks generates data representations passed to task-specific predictors.
We propose Dummy Gradient norm Regularization that aims to improve the universality of the representations generated by the shared encoder.
We show that DGR effectively improves the quality of the shared representations, leading to better multi-task prediction performances.
arXiv Detail & Related papers (2024-09-25T06:08:35Z) - The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback [12.388205905012423]
Reinforcement learning from human feedback has contributed to performance improvements in large language models.
We formulate RLHF as the contextual dueling bandit problem and assume a common linear representation.
We prove that to achieve $varepsilon-$optimal, the sample complexity of the source tasks can be significantly reduced.
arXiv Detail & Related papers (2024-05-18T08:29:15Z) - Sample Efficient Myopic Exploration Through Multitask Reinforcement
Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient.
To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Provable Benefit of Multitask Representation Learning in Reinforcement
Learning [46.11628795660159]
This paper theoretically characterizes the benefit of representation learning under the low-rank Markov decision process (MDP) model.
To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask reinforcement learning.
arXiv Detail & Related papers (2022-06-13T04:29:02Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Learning Robust State Abstractions for Hidden-Parameter Block MDPs [55.31018404591743]
We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs.
We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
arXiv Detail & Related papers (2020-07-14T17:25:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.