Enhanced Meta Reinforcement Learning using Demonstrations in Sparse
Reward Environments
- URL: http://arxiv.org/abs/2209.13048v1
- Date: Mon, 26 Sep 2022 22:01:12 GMT
- Title: Enhanced Meta Reinforcement Learning using Demonstrations in Sparse
Reward Environments
- Authors: Desik Rengarajan, Sapana Chaudhary, Jaewon Kim, Dileep Kalathil,
Srinivas Shakkottai
- Abstract summary: We develop a class of algorithms entitled Enhanced Meta-RL using Demonstrations.
We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy.
We also show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments.
- Score: 10.360491332190433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta reinforcement learning (Meta-RL) is an approach wherein the experience
gained from solving a variety of tasks is distilled into a meta-policy. The
meta-policy, when adapted over only a small (or just a single) number of steps,
is able to perform near-optimally on a new, related task. However, a major
challenge to adopting this approach to solve real-world problems is that they
are often associated with sparse reward functions that only indicate whether a
task is completed partially or fully. We consider the situation where some
data, possibly generated by a sub-optimal agent, is available for each task. We
then develop a class of algorithms entitled Enhanced Meta-RL using
Demonstrations (EMRLD) that exploit this information even if sub-optimal to
obtain guidance during training. We show how EMRLD jointly utilizes RL and
supervised learning over the offline data to generate a meta-policy that
demonstrates monotone performance improvements. We also develop a warm started
variant called EMRLD-WS that is particularly efficient for sub-optimal
demonstration data. Finally, we show that our EMRLD algorithms significantly
outperform existing approaches in a variety of sparse reward environments,
including that of a mobile robot.
Related papers
- MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning [18.82398325614491]
We propose a new model-based approach to meta-RL, based on elements from existing state-of-the-art model-based and meta-RL methods.
We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency.
In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents.
arXiv Detail & Related papers (2024-03-14T20:40:36Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner.
SiMT generates the target model by adapting from the temporal ensemble of the meta-learner.
We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Dynamic Channel Access via Meta-Reinforcement Learning [0.8223798883838329]
We propose a meta-DRL framework that incorporates the method of Model-Agnostic Meta-Learning (MAML)
We show that only a few gradient descents are required for adapting to different tasks drawn from the same distribution.
arXiv Detail & Related papers (2021-12-24T15:04:43Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z) - Curriculum in Gradient-Based Meta-Reinforcement Learning [10.447238563837173]
We show that gradient-based meta-learners are sensitive to task distributions.
With the wrong curriculum, agents suffer the effects of meta-overfitting, shallow adaptation, and adaptation instability.
arXiv Detail & Related papers (2020-02-19T01:40:45Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.