Characterizing Policy Divergence for Personalized Meta-Reinforcement
Learning
- URL: http://arxiv.org/abs/2010.04816v1
- Date: Fri, 9 Oct 2020 21:31:53 GMT
- Title: Characterizing Policy Divergence for Personalized Meta-Reinforcement
Learning
- Authors: Michael Zhang
- Abstract summary: We consider the problem of recommending optimal policies to a set of multiple entities each with potentially different characteristics.
Inspired by existing literature in meta-learning, we propose a model-free meta-learning algorithm that prioritizes past experiences by relevance during gradient-based adaptation.
Our algorithm involves characterizing past policy divergence through methods in inverse reinforcement learning, and we illustrate how such metrics are able to effectively distinguish past policy parameters by the environment they were deployed in.
- Score: 4.716565301427257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite ample motivation from costly exploration and limited trajectory data,
rapidly adapting to new environments with few-shot reinforcement learning (RL)
can remain a challenging task, especially with respect to personalized
settings. Here, we consider the problem of recommending optimal policies to a
set of multiple entities each with potentially different characteristics, such
that individual entities may parameterize distinct environments with unique
transition dynamics. Inspired by existing literature in meta-learning, we
extend previous work by focusing on the notion that certain environments are
more similar to each other than others in personalized settings, and propose a
model-free meta-learning algorithm that prioritizes past experiences by
relevance during gradient-based adaptation. Our algorithm involves
characterizing past policy divergence through methods in inverse reinforcement
learning, and we illustrate how such metrics are able to effectively
distinguish past policy parameters by the environment they were deployed in,
leading to more effective fast adaptation during test time. To study
personalization more effectively we introduce a navigation testbed to
specifically incorporate environment diversity across training episodes, and
demonstrate that our approach outperforms meta-learning alternatives with
respect to few-shot reinforcement learning in personalized settings.
Related papers
- C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front [9.04360155372014]
Constrained MORL is a seamless bridge between constrained policy optimization and MORL.
Our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks.
arXiv Detail & Related papers (2024-10-03T06:13:56Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - MetaModulation: Learning Variational Feature Hierarchies for Few-Shot
Learning with Fewer Tasks [63.016244188951696]
We propose a method for few-shot learning with fewer tasks, which is by metaulation.
We modify parameters at various batch levels to increase the meta-training tasks.
We also introduce learning variational feature hierarchies by incorporating the variationalulation.
arXiv Detail & Related papers (2023-05-17T15:47:47Z) - Invariant Meta Learning for Out-of-Distribution Generalization [1.1718589131017048]
In this paper, we propose invariant meta learning for out-of-distribution tasks.
Specifically, invariant optimal meta-initialization,and fast adapt to out-of-distribution tasks with regularization penalty.
arXiv Detail & Related papers (2023-01-26T12:53:21Z) - Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation [17.165083095799712]
We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
arXiv Detail & Related papers (2022-11-20T03:55:09Z) - Dynamic Regret Analysis for Online Meta-Learning [0.0]
The online meta-learning framework has arisen as a powerful tool for the continual lifelong learning setting.
This formulation involves two levels: outer level which learns meta-learners and inner level which learns task-specific models.
We establish performance in terms of dynamic regret which handles changing environments from a global prospective.
We carry out our analyses in a setting, and in expectation prove a logarithmic local dynamic regret which explicitly depends on the total number of iterations.
arXiv Detail & Related papers (2021-09-29T12:12:59Z) - Meta Navigator: Search for a Good Adaptation Policy for Few-shot
Learning [113.05118113697111]
Few-shot learning aims to adapt knowledge learned from previous tasks to novel tasks with only a limited amount of labeled data.
Research literature on few-shot learning exhibits great diversity, while different algorithms often excel at different few-shot learning scenarios.
We present Meta Navigator, a framework that attempts to solve the limitation in few-shot learning by seeking a higher-level strategy.
arXiv Detail & Related papers (2021-09-13T07:20:01Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Meta-learning the Learning Trends Shared Across Tasks [123.10294801296926]
Gradient-based meta-learning algorithms excel at quick adaptation to new tasks with limited data.
Existing meta-learning approaches only depend on the current task information during the adaptation.
We propose a 'Path-aware' model-agnostic meta-learning approach.
arXiv Detail & Related papers (2020-10-19T08:06:47Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Provably Efficient Model-based Policy Adaptation [22.752774605277555]
A promising approach is to quickly adapt pre-trained policies to new environments.
Existing methods for this policy adaptation problem typically rely on domain randomization and meta-learning.
We propose new model-based mechanisms that are able to make online adaptation in unseen target environments.
arXiv Detail & Related papers (2020-06-14T23:16:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.