Learning on a Budget via Teacher Imitation
- URL: http://arxiv.org/abs/2104.08440v1
- Date: Sat, 17 Apr 2021 04:15:00 GMT
- Title: Learning on a Budget via Teacher Imitation
- Authors: Ercument Ilhan, Jeremy Gow and Diego Perez-Liebana
- Abstract summary: Action advising is a framework that provides a flexible way to transfer such knowledge in the form of actions between teacher-student peers.
We extend the idea of advice reusing via teacher imitation to construct a unified approach that addresses both advice collection and advice utilisation problems.
- Score: 0.5185131234265025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Reinforcement Learning (RL) techniques can benefit greatly from
leveraging prior experience, which can be either self-generated or acquired
from other entities. Action advising is a framework that provides a flexible
way to transfer such knowledge in the form of actions between teacher-student
peers. However, due to the realistic concerns, the number of these interactions
is limited with a budget; therefore, it is crucial to perform these in the most
appropriate moments. There have been several promising studies recently that
address this problem setting especially from the student's perspective. Despite
their success, they have some shortcomings when it comes to the practical
applicability and integrity as an overall solution to the learning from advice
challenge. In this paper, we extend the idea of advice reusing via teacher
imitation to construct a unified approach that addresses both advice collection
and advice utilisation problems. Furthermore, we also propose a method to
automatically determine the relevant hyperparameters of these components
on-the-fly to make it able to adapt to any task with minimal human
intervention. The experiments we performed in 5 different Atari games verify
that our algorithm can outperform its competitors by achieving state-of-the-art
performance, and its components themselves also provides significant advantages
individually.
Related papers
- Learning to Assist Humans without Inferring Rewards [65.28156318196397]
We build upon prior work that studies assistance through the lens of empowerment.
An assistive agent aims to maximize the influence of the human's actions.
We prove that these representations estimate a similar notion of empowerment to that studied by prior work.
arXiv Detail & Related papers (2024-11-04T21:31:04Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Optimising Human-AI Collaboration by Learning Convincing Explanations [62.81395661556852]
We propose a method for a collaborative system that remains safe by having a human making decisions.
Ardent enables efficient and effective decision-making by adapting to individual preferences for explanations.
arXiv Detail & Related papers (2023-11-13T16:00:16Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Action Advising with Advice Imitation in Deep Reinforcement Learning [0.5185131234265025]
Action advising is a peer-to-peer knowledge exchange technique built on the teacher-student paradigm.
We present an approach to enable the student agent to imitate previously acquired advice to reuse them directly in its exploration policy.
arXiv Detail & Related papers (2021-04-17T04:24:04Z) - Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning.
In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment.
The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z) - Human AI interaction loop training: New approach for interactive
reinforcement learning [0.0]
Reinforcement Learning (RL) in various decision-making tasks of machine learning provides effective results with an agent learning from a stand-alone reward function.
RL presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards.
Imitation Learning (IL) offers a promising solution for those challenges using a teacher.
arXiv Detail & Related papers (2020-03-09T15:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.