Joint Goal and Strategy Inference across Heterogeneous Demonstrators via
Reward Network Distillation
- URL: http://arxiv.org/abs/2001.00503v3
- Date: Mon, 23 Nov 2020 16:04:47 GMT
- Title: Joint Goal and Strategy Inference across Heterogeneous Demonstrators via
Reward Network Distillation
- Authors: Letian Chen, Rohan Paleja, Muyleng Ghuy, Matthew Gombolay
- Abstract summary: inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations.
We propose a method to jointly infer a task goal and humans' strategic preferences via network distillation.
We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies in two simulated tasks and a real-world table tennis task.
- Score: 1.1470070927586016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has achieved tremendous success as a general
framework for learning how to make decisions. However, this success relies on
the interactive hand-tuning of a reward function by RL experts. On the other
hand, inverse reinforcement learning (IRL) seeks to learn a reward function
from readily-obtained human demonstrations. Yet, IRL suffers from two major
limitations: 1) reward ambiguity - there are an infinite number of possible
reward functions that could explain an expert's demonstration and 2)
heterogeneity - human experts adopt varying strategies and preferences, which
makes learning from multiple demonstrators difficult due to the common
assumption that demonstrators seeks to maximize the same reward. In this work,
we propose a method to jointly infer a task goal and humans' strategic
preferences via network distillation. This approach enables us to distill a
robust task reward (addressing reward ambiguity) and to model each strategy's
objective (handling heterogeneity). We demonstrate our algorithm can better
recover task reward and strategy rewards and imitate the strategies in two
simulated tasks and a real-world table tennis task.
Related papers
- RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations.
RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Towards Effective Multiple-in-One Image Restoration: A Sequential and Prompt Learning Strategy [19.73705830803488]
We investigate in-depth the multiple-in-one (MiO) IR problem, which comprises seven popular IR tasks.
To tackle these challenges, we present two simple yet effective strategies.
By evaluating on 19 test sets, we demonstrate that the sequential and prompt learning strategies can significantly enhance the MiO performance.
arXiv Detail & Related papers (2024-01-07T03:35:04Z) - Curricular Subgoals for Inverse Reinforcement Learning [21.038691420095525]
Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning.
Existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert.
We propose a novel Curricular Subgoal-based Inverse Reinforcement Learning framework, that explicitly disentangles one task with several local subgoals to guide agent imitation.
arXiv Detail & Related papers (2023-06-14T04:06:41Z) - Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning
for Task-oriented Dialogue Systems [111.80916118530398]
reinforcement learning (RL) techniques can naturally be utilized to train dialogue strategies to achieve user-specific goals.
This paper aims at answering the question of how to efficiently learn and leverage a reward function for training end-to-end (E2E) ToD agents.
arXiv Detail & Related papers (2023-02-20T22:10:04Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Adversarial Motion Priors Make Good Substitutes for Complex Reward
Functions [124.11520774395748]
Reinforcement learning practitioners often utilize complex reward functions that encourage physically plausible behaviors.
We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations.
A learned style reward can be combined with an arbitrary task reward to train policies that perform tasks using naturalistic strategies.
arXiv Detail & Related papers (2022-03-28T21:17:36Z) - MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced
Active Learning [14.06682547001011]
State-of-the art methods typically focus on learning a single reward model.
We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms.
Our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies.
arXiv Detail & Related papers (2021-12-30T19:21:03Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z) - Self Punishment and Reward Backfill for Deep Q-Learning [6.572828651397661]
Reinforcement learning agents learn by encouraging behaviours which maximize their total reward, usually provided by the environment.
In many environments, the reward is provided after a series of actions rather than each single action, leading the agent to experience ambiguity in terms of whether those actions are effective.
We propose two strategies inspired by behavioural psychology to enable the agent to intrinsically estimate more informative reward values for actions with no reward.
arXiv Detail & Related papers (2020-04-10T11:53:11Z) - Intrinsic Motivation for Encouraging Synergistic Behavior [55.10275467562764]
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks.
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
arXiv Detail & Related papers (2020-02-12T19:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.