Generalized Hidden Parameter MDPs Transferable Model-based RL in a
Handful of Trials
- URL: http://arxiv.org/abs/2002.03072v1
- Date: Sat, 8 Feb 2020 02:49:33 GMT
- Title: Generalized Hidden Parameter MDPs Transferable Model-based RL in a
Handful of Trials
- Authors: Christian F. Perez, Felipe Petroski Such, Theofanis Karaletsos
- Abstract summary: Generalized Hidden MDPs (GHP-MDPs) describe a family of MDPs where both dynamics and reward can change as a function of hidden parameters that vary across tasks.
We experimentally demonstrate state-of-the-art performance and sample-efficiency on a new challenging MuJoCo task using reward and dynamics latent spaces.
- Score: 13.051708608864539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is broad interest in creating RL agents that can solve many (related)
tasks and adapt to new tasks and environments after initial training.
Model-based RL leverages learned surrogate models that describe dynamics and
rewards of individual tasks, such that planning in a good surrogate can lead to
good control of the true system. Rather than solving each task individually
from scratch, hierarchical models can exploit the fact that tasks are often
related by (unobserved) causal factors of variation in order to achieve
efficient generalization, as in learning how the mass of an item affects the
force required to lift it can generalize to previously unobserved masses. We
propose Generalized Hidden Parameter MDPs (GHP-MDPs) that describe a family of
MDPs where both dynamics and reward can change as a function of hidden
parameters that vary across tasks. The GHP-MDP augments model-based RL with
latent variables that capture these hidden parameters, facilitating transfer
across tasks. We also explore a variant of the model that incorporates explicit
latent structure mirroring the causal factors of variation across tasks (for
instance: agent properties, environmental factors, and goals). We
experimentally demonstrate state-of-the-art performance and sample-efficiency
on a new challenging MuJoCo task using reward and dynamics latent spaces, while
beating a previous state-of-the-art baseline with $>10\times$ less data. Using
test-time inference of the latent variables, our approach generalizes in a
single episode to novel combinations of dynamics and reward, and to novel
rewards.
Related papers
- Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data.
Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts.
We propose Task Groupings Regularization, a novel approach that benefits from model heterogeneity by grouping and aligning conflicting tasks.
arXiv Detail & Related papers (2024-05-26T13:11:55Z) - Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently.
Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable.
We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional
MoEs [63.936622239286685]
We find that interference among different tasks and modalities is the main factor to this phenomenon.
We introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.
Code and pre-trained generalist models shall be released.
arXiv Detail & Related papers (2022-06-09T17:59:59Z) - A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong
Reinforcement Learning [11.076005074172516]
reinforcement learning algorithms can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information.
We propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge.
We show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.
arXiv Detail & Related papers (2022-05-22T09:48:41Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - Rethinking Hard-Parameter Sharing in Multi-Task Learning [20.792654758645302]
Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy.
The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task.
Using separate bottom-layer parameters could achieve significantly better performance than the common practice.
arXiv Detail & Related papers (2021-07-23T17:26:40Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.