Importance Weighted Policy Learning and Adaptation
- URL: http://arxiv.org/abs/2009.04875v2
- Date: Fri, 4 Jun 2021 13:21:40 GMT
- Title: Importance Weighted Policy Learning and Adaptation
- Authors: Alexandre Galashov, Jakub Sygnowski, Guillaume Desjardins, Jan
Humplik, Leonard Hasenclever, Rae Jeong, Yee Whye Teh, Nicolas Heess
- Abstract summary: We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
- Score: 89.46467771037054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to exploit prior experience to solve novel problems rapidly is a
hallmark of biological learning systems and of great practical importance for
artificial ones. In the meta reinforcement learning literature much recent work
has focused on the problem of optimizing the learning process itself. In this
paper we study a complementary approach which is conceptually simple, general,
modular and built on top of recent improvements in off-policy learning. The
framework is inspired by ideas from the probabilistic inference literature and
combines robust off-policy learning with a behavior prior, or default behavior
that constrains the space of solutions and serves as a bias for exploration; as
well as a representation for the value function, both of which are easily
learned from a number of training tasks in a multi-task scenario. Our approach
achieves competitive adaptation performance on hold-out tasks compared to meta
reinforcement learning baselines and can scale to complex sparse-reward
scenarios.
Related papers
- I Know How: Combining Prior Policies to Solve New Tasks [17.214443593424498]
Multi-Task Reinforcement Learning aims at developing agents that are able to continually evolve and adapt to new scenarios.
Learning from scratch for each new task is not a viable or sustainable option.
We propose a new framework, I Know How, which provides a common formalization.
arXiv Detail & Related papers (2024-06-14T08:44:51Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Adaptive Policy Transfer in Reinforcement Learning [9.594432031144715]
We introduce a principled mechanism that can "Adapt-to-Learn", that is adapt the source policy to learn to solve a target task.
We show that the presented method learns to seamlessly combine learning from adaptation and exploration and leads to a robust policy transfer algorithm.
arXiv Detail & Related papers (2021-05-10T22:42:03Z) - Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Revisiting Meta-Learning as Supervised Learning [69.2067288158133]
We aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and traditional supervised learning.
By treating pairs of task-specific data sets and target models as (feature, label) samples, we can reduce many meta-learning algorithms to instances of supervised learning.
This view not only unifies meta-learning into an intuitive and practical framework but also allows us to transfer insights from supervised learning directly to improve meta-learning.
arXiv Detail & Related papers (2020-02-03T06:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.