Action Advising with Advice Imitation in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2104.08441v1
- Date: Sat, 17 Apr 2021 04:24:04 GMT
- Title: Action Advising with Advice Imitation in Deep Reinforcement Learning
- Authors: Ercument Ilhan, Jeremy Gow and Diego Perez-Liebana
- Abstract summary: Action advising is a peer-to-peer knowledge exchange technique built on the teacher-student paradigm.
We present an approach to enable the student agent to imitate previously acquired advice to reuse them directly in its exploration policy.
- Score: 0.5185131234265025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Action advising is a peer-to-peer knowledge exchange technique built on the
teacher-student paradigm to alleviate the sample inefficiency problem in deep
reinforcement learning. Recently proposed student-initiated approaches have
obtained promising results. However, due to being in the early stages of
development, these also have some substantial shortcomings. One of the
abilities that are absent in the current methods is further utilising advice by
reusing, which is especially crucial in the practical settings considering the
budget and cost constraints in peer-to-peer. In this study, we present an
approach to enable the student agent to imitate previously acquired advice to
reuse them directly in its exploration policy, without any interventions in the
learning mechanism itself. In particular, we employ a behavioural cloning
module to imitate the teacher policy and use dropout regularisation to have a
notion of epistemic uncertainty to keep track of which state-advice pairs are
actually collected. As the results of experiments we conducted in three Atari
games show, advice reusing via generalisation is indeed a feasible option in
deep RL and our approach can successfully achieve this while significantly
improving the learning performance, even when paired with a simple early
advising heuristic.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Explainable Action Advising for Multi-Agent Reinforcement Learning [32.49380192781649]
Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm.
We introduce Explainable Action Advising, in which the teacher provides action advice as well as associated explanations indicating why the action was chosen.
This allows the student to self-reflect on what it has learned, enabling generalization advice and leading to improved sample efficiency and learning performance.
arXiv Detail & Related papers (2022-11-15T04:15:03Z) - Soft Action Priors: Towards Robust Policy Transfer [9.860944032009847]
We use the action prior from the Reinforcement Learning as Inference framework to recover state-of-the-art policy distillation techniques.
Then, we propose a class of adaptive methods that can robustly exploit action priors by combining reward shaping and auxiliary regularization losses.
We show that the proposed methods achieve state-of-the-art performance, surpassing it when learning from suboptimal priors.
arXiv Detail & Related papers (2022-09-20T17:36:28Z) - Imitating Past Successes can be Very Suboptimal [145.70788608016755]
We show that existing outcome-conditioned imitation learning methods do not necessarily improve the policy.
We show that a simple modification results in a method that does guarantee policy improvement.
Our aim is not to develop an entirely new method, but rather to explain how a variant of outcome-conditioned imitation learning can be used to maximize rewards.
arXiv Detail & Related papers (2022-06-07T15:13:43Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Learning on a Budget via Teacher Imitation [0.5185131234265025]
Action advising is a framework that provides a flexible way to transfer such knowledge in the form of actions between teacher-student peers.
We extend the idea of advice reusing via teacher imitation to construct a unified approach that addresses both advice collection and advice utilisation problems.
arXiv Detail & Related papers (2021-04-17T04:15:00Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Self-Imitation Advantage Learning [43.8107780378031]
Self-imitation learning is a Reinforcement Learning method that encourages actions whose returns were higher than expected.
We propose a novel generalization of self-imitation learning for off-policy RL, based on a modification of the Bellman optimality operator.
arXiv Detail & Related papers (2020-12-22T13:21:50Z) - Student-Initiated Action Advising via Advice Novelty [0.14323566945483493]
Student-initiated techniques that utilise state novelty and uncertainty estimations have obtained promising results.
We propose a student-initiated algorithm that alleviates these by employing Random Network Distillation (RND) to measure the novelty of a piece of advice.
arXiv Detail & Related papers (2020-10-01T13:20:28Z) - Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model
Distillation Approach [55.83558520598304]
We propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation.
We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge.
Our proposed framework, namely Learning and Teaching Categorical Reinforcement, shows promising performance on stabilizing and accelerating learning progress.
arXiv Detail & Related papers (2020-02-06T11:31:04Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.