Related papers: Teachable Reinforcement Learning via Advice Distillation

Teachable Reinforcement Learning via Advice Distillation

URL: http://arxiv.org/abs/2203.11197v1
Date: Sat, 19 Mar 2022 03:22:57 GMT
Title: Teachable Reinforcement Learning via Advice Distillation
Authors: Olivia Watkins, Trevor Darrell, Pieter Abbeel, Jacob Andreas, Abhishek Gupta
Abstract summary: We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher. We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
Score: 161.43457947665073
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and extracts little information from each human intervention. Can we overcome these challenges by building agents that learn from rich, interactive feedback instead? We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher. We begin by formalizing a class of human-in-the-loop decision making problems in which multiple forms of teacher-provided advice are available to a learner. We then describe a simple learning algorithm for these problems that first learns to interpret advice, then learns from advice to complete tasks even in the absence of human supervision. In puzzle-solving, navigation, and locomotion domains, we show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms and often less than imitation learning.

Related papers

Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning. We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning. We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z)
LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers [59.69716962256727]
We propose a framework that guides a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded in a training environment from large language models. We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment.
arXiv Detail & Related papers (2023-12-14T14:07:41Z)
Human Decision Makings on Curriculum Reinforcement Learning with Difficulty Adjustment [52.07473934146584]
We guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process. Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications. It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level.
arXiv Detail & Related papers (2022-08-04T23:53:51Z)
Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space. The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z)
Systematic human learning and generalization from a brief tutorial with explanatory feedback [3.7826494079172557]
We investigate human adults' ability to learn an abstract reasoning task based on Sudoku. We find that participants who master the task do so within a small number of trials and generalize well to puzzles outside of the training range. We also find that most of those who master the task can describe a valid solution strategy, and such participants perform better on transfer puzzles than those whose strategy descriptions are vague or incomplete.
arXiv Detail & Related papers (2021-07-10T00:14:41Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
Deep Reinforcement Learning with Interactive Feedback in a Human-Robot Environment [1.2998475032187096]
We propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a human-robot scenario. We compare three different learning methods using a simulated robotic arm for the task of organizing different objects. The obtained results show that a learner agent, using either agent-IDeepRL or human-IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.
arXiv Detail & Related papers (2020-07-07T11:55:27Z)
Learning Transferable Concepts in Deep Reinforcement Learning [0.7161783472741748]
We show that learning discrete representations of sensory inputs can provide a high-level abstraction that is common across multiple tasks. In particular, we show that it is possible to learn such representations by self-supervision, following an information theoretic approach. Our method is able to learn concepts in locomotive and optimal control tasks that increase the sample efficiency in both known and unknown tasks.
arXiv Detail & Related papers (2020-05-16T04:45:51Z)
KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge [40.343858932413376]
We propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge.
arXiv Detail & Related papers (2020-02-18T07:58:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.