Linear Representation Meta-Reinforcement Learning for Instant Adaptation
- URL: http://arxiv.org/abs/2101.04750v1
- Date: Tue, 12 Jan 2021 20:56:34 GMT
- Title: Linear Representation Meta-Reinforcement Learning for Instant Adaptation
- Authors: Matt Peng, Banghua Zhu, Jiantao Jiao
- Abstract summary: This paper introduces Fast Linearized Adaptive Policy (FLAP)
FLAP is a new meta-reinforcement learning (meta-RL) method that is able to extrapolate well to out-of-distribution tasks.
- Score: 20.711877803169134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces Fast Linearized Adaptive Policy (FLAP), a new
meta-reinforcement learning (meta-RL) method that is able to extrapolate well
to out-of-distribution tasks without the need to reuse data from training, and
adapt almost instantaneously with the need of only a few samples during
testing. FLAP builds upon the idea of learning a shared linear representation
of the policy so that when adapting to a new task, it suffices to predict a set
of linear weights. A separate adapter network is trained simultaneously with
the policy such that during adaptation, we can directly use the adapter network
to predict these linear weights instead of updating a meta-policy via gradient
descent, such as in prior meta-RL methods like MAML, to obtain the new policy.
The application of the separate feed-forward network not only speeds up the
adaptation run-time significantly, but also generalizes extremely well to very
different tasks that prior Meta-RL methods fail to generalize to. Experiments
on standard continuous-control meta-RL benchmarks show FLAP presenting
significantly stronger performance on out-of-distribution tasks with up to
double the average return and up to 8X faster adaptation run-time speeds when
compared to prior methods.
Related papers
- Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts.
We propose a test-time Forward-Optimization Adaptation (FOA) method.
FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - Offline Meta Reinforcement Learning with In-Distribution Online
Adaptation [38.35415999829767]
We first characterize a unique challenge in offline meta-RL: transition-reward distribution shift between offline datasets and online adaptation.
We propose a novel adaptation framework, called In-Distribution online Adaptation with uncertainty Quantification (IDAQ)
IDAQ generates in-distribution context using a given uncertainty and performs effective task belief inference to address new tasks.
arXiv Detail & Related papers (2023-05-31T03:34:39Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z) - Transfer Bayesian Meta-learning via Weighted Free Energy Minimization [37.51664463278401]
A key assumption is that the auxiliary tasks, known as meta-training tasks, share the same generating distribution as the tasks to be encountered at deployment time.
This paper introduces weighted free energy minimization (WFEM) for transfer meta-learning.
arXiv Detail & Related papers (2021-06-20T15:17:51Z) - Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces [14.029933823101084]
We propose a novel off-policy meta-RL method, embedding learning and evaluation of uncertainty (ELUE)
ELUE learns a belief model over the embedding space and a belief-conditional policy and Q-function.
We demonstrate that ELUE outperforms state-of-the-art meta RL methods through experiments on meta-RL benchmarks.
arXiv Detail & Related papers (2021-01-06T05:51:38Z) - Meta-Learning with Adaptive Hyperparameters [55.182841228303225]
We focus on a complementary factor in MAML framework, inner-loop optimization (or fast adaptation)
We propose a new weight update rule that greatly enhances the fast adaptation process.
arXiv Detail & Related papers (2020-10-31T08:05:34Z) - Offline Meta-Reinforcement Learning with Advantage Weighting [125.21298190780259]
This paper introduces the offline meta-reinforcement learning (offline meta-RL) problem setting and proposes an algorithm that performs well in this setting.
offline meta-RL is analogous to the widely successful supervised learning strategy of pre-training a model on a large batch of fixed, pre-collected data.
We propose Meta-Actor Critic with Advantage Weighting (MACAW), an optimization-based meta-learning algorithm that uses simple, supervised regression objectives for both the inner and outer loop of meta-training.
arXiv Detail & Related papers (2020-08-13T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.