Related papers: Joint Representation Training in Sequential Tasks with Shared Structure

Joint Representation Training in Sequential Tasks with Shared Structure

URL: http://arxiv.org/abs/2206.12441v1
Date: Fri, 24 Jun 2022 18:10:00 GMT
Title: Joint Representation Training in Sequential Tasks with Shared Structure
Authors: Aldo Pacchiano, Ofir Nachum, Nilseh Tripuraneni, Peter Bartlett
Abstract summary: We introduce the Shared-MatrixRL algorithm for the setting of Multitask MatrixRL. We show the regret on the the $P$ tasks can be improved from $O(PHdsqrtNH)$ to $O((HdsqrtrP + HPsqrtrd)sqrtNH)$ over $N$ episodes of horizon $H$.
Score: 40.1056491921582
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classical theory in reinforcement learning (RL) predominantly focuses on the single task setting, where an agent learns to solve a task through trial-and-error experience, given access to data only from that task. However, many recent empirical works have demonstrated the significant practical benefits of leveraging a joint representation trained across multiple, related tasks. In this work we theoretically analyze such a setting, formalizing the concept of task relatedness as a shared state-action representation that admits linear dynamics in all the tasks. We introduce the Shared-MatrixRL algorithm for the setting of Multitask MatrixRL. In the presence of $P$ episodic tasks of dimension $d$ sharing a joint $r \ll d$ low-dimensional representation, we show the regret on the the $P$ tasks can be improved from $O(PHd\sqrt{NH})$ to $O((Hd\sqrt{rP} + HP\sqrt{rd})\sqrt{NH})$ over $N$ episodes of horizon $H$. These gains coincide with those observed in other linear models in contextual bandits and RL. In contrast with previous work that have studied multi task RL in other function approximation models, we show that in the presence of bilinear optimization oracle and finite state action spaces there exists a computationally efficient algorithm for multitask MatrixRL via a reduction to quadratic programming. We also develop a simple technique to shave off a $\sqrt{H}$ factor from the regret upper bounds of some episodic linear problems.

Related papers

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners [60.75160178669076]
We show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online reinforcement learning.<n>We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL.
arXiv Detail & Related papers (2025-05-29T06:41:45Z)
Towards Understanding the Benefit of Multitask Representation Learning in Decision Process [47.57497888077687]
Multitask Representation Learning (MRL) has emerged as a prevalent technique to improve sample efficiency in Reinforcement Learning (RL) Our work tries to fill the gap by extending the analysis to textitunknown non-linear representations, giving a comprehensive analysis for its mechanism in online and transfer learning setting. We formally prove that this approach yields a regret upper bound that outperforms the lower bound associated with learning $M$ separate tasks, marking the first demonstration of MRL's efficacy in a general function class.
arXiv Detail & Related papers (2025-03-01T04:29:22Z)
Beyond Task Diversity: Provable Representation Transfer for Sequential Multi-Task Linear Bandits [17.970177214029473]
We study lifelong learning in linear bandits, where a learner interacts with a sequence of linear bandit tasks.<n>Current literature assumes that the tasks are diverse, i.e., their parameters uniformly span the $m$-dimensional subspace.<n>We present the first nontrivial result for sequential multi-task linear bandits without the task diversity assumption.
arXiv Detail & Related papers (2025-01-23T05:21:27Z)
Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes [56.714690083118406]
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures has been shown to yield significant benefits to the sample efficiency compared to single-task RL. We investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs) We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSR
arXiv Detail & Related papers (2023-10-20T14:50:28Z)
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z)
Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing [38.883540444516605]
DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents. We conduct both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks. We propose an algorithm called DistMT-LSVI, where each agent independently learns $epsilon$-optimal policies for all $M$ tasks.
arXiv Detail & Related papers (2023-07-11T22:58:53Z)
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED) PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward. To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z)
Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation. We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z)
Multi-task Representation Learning with Stochastic Linear Bandits [29.8208189270894]
We study the problem of transfer-learning in the setting of linear bandit tasks. We consider that a low dimensional linear representation is shared across the tasks, and study the benefit of learning this representation in the multi-task learning setting.
arXiv Detail & Related papers (2022-02-21T09:26:34Z)
Near-optimal Representation Learning for Linear Bandits and Linear RL [41.33483293243257]
We first consider the setting where we play $M$ linear bandits with dimension $d$ concurrently. These bandits share a common $k$-dimensional linear representation so that $kll d$ and $k ll M$. We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve $tildeO(MsqrtdkT + dsqrtkMT )$ regret.
arXiv Detail & Related papers (2021-02-08T11:11:53Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.