Teachable Reinforcement Learning via Advice Distillation
- URL: http://arxiv.org/abs/2203.11197v1
- Date: Sat, 19 Mar 2022 03:22:57 GMT
- Title: Teachable Reinforcement Learning via Advice Distillation
- Authors: Olivia Watkins, Trevor Darrell, Pieter Abbeel, Jacob Andreas, Abhishek
Gupta
- Abstract summary: We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher.
We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
- Score: 161.43457947665073
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training automated agents to complete complex tasks in interactive
environments is challenging: reinforcement learning requires careful
hand-engineering of reward functions, imitation learning requires specialized
infrastructure and access to a human expert, and learning from intermediate
forms of supervision (like binary preferences) is time-consuming and extracts
little information from each human intervention. Can we overcome these
challenges by building agents that learn from rich, interactive feedback
instead? We propose a new supervision paradigm for interactive learning based
on "teachable" decision-making systems that learn from structured advice
provided by an external teacher. We begin by formalizing a class of
human-in-the-loop decision making problems in which multiple forms of
teacher-provided advice are available to a learner. We then describe a simple
learning algorithm for these problems that first learns to interpret advice,
then learns from advice to complete tasks even in the absence of human
supervision. In puzzle-solving, navigation, and locomotion domains, we show
that agents that learn from advice can acquire new skills with significantly
less human supervision than standard reinforcement learning algorithms and
often less than imitation learning.
Related papers
- Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - LiFT: Unsupervised Reinforcement Learning with Foundation Models as
Teachers [59.69716962256727]
We propose a framework that guides a reinforcement learning agent to acquire semantically meaningful behavior without human feedback.
In our framework, the agent receives task instructions grounded in a training environment from large language models.
We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment.
arXiv Detail & Related papers (2023-12-14T14:07:41Z) - Human Decision Makings on Curriculum Reinforcement Learning with
Difficulty Adjustment [52.07473934146584]
We guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process.
Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications.
It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level.
arXiv Detail & Related papers (2022-08-04T23:53:51Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Systematic human learning and generalization from a brief tutorial with
explanatory feedback [3.7826494079172557]
We investigate human adults' ability to learn an abstract reasoning task based on Sudoku.
We find that participants who master the task do so within a small number of trials and generalize well to puzzles outside of the training range.
We also find that most of those who master the task can describe a valid solution strategy, and such participants perform better on transfer puzzles than those whose strategy descriptions are vague or incomplete.
arXiv Detail & Related papers (2021-07-10T00:14:41Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Deep Reinforcement Learning with Interactive Feedback in a Human-Robot
Environment [1.2998475032187096]
We propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a human-robot scenario.
We compare three different learning methods using a simulated robotic arm for the task of organizing different objects.
The obtained results show that a learner agent, using either agent-IDeepRL or human-IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.
arXiv Detail & Related papers (2020-07-07T11:55:27Z) - Learning Transferable Concepts in Deep Reinforcement Learning [0.7161783472741748]
We show that learning discrete representations of sensory inputs can provide a high-level abstraction that is common across multiple tasks.
In particular, we show that it is possible to learn such representations by self-supervision, following an information theoretic approach.
Our method is able to learn concepts in locomotive and optimal control tasks that increase the sample efficiency in both known and unknown tasks.
arXiv Detail & Related papers (2020-05-16T04:45:51Z) - KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human
Suboptimal Knowledge [40.343858932413376]
We propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning.
Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge.
arXiv Detail & Related papers (2020-02-18T07:58:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.