Skill-Based Reinforcement Learning with Intrinsic Reward Matching
- URL: http://arxiv.org/abs/2210.07426v4
- Date: Thu, 25 May 2023 22:30:29 GMT
- Title: Skill-Based Reinforcement Learning with Intrinsic Reward Matching
- Authors: Ademi Adeniji, Amber Xie, Pieter Abbeel
- Abstract summary: We present Intrinsic Reward Matching (IRM), which unifies task-agnostic skill pretraining and task-aware finetuning.
IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods.
- Score: 77.34726150561087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While unsupervised skill discovery has shown promise in autonomously
acquiring behavioral primitives, there is still a large methodological
disconnect between task-agnostic skill pretraining and downstream, task-aware
finetuning. We present Intrinsic Reward Matching (IRM), which unifies these two
phases of learning via the $\textit{skill discriminator}$, a pretraining model
component often discarded during finetuning. Conventional approaches finetune
pretrained agents directly at the policy level, often relying on expensive
environment rollouts to empirically determine the optimal skill. However, often
the most concise yet complete description of a task is the reward function
itself, and skill learning methods learn an $\textit{intrinsic}$ reward
function via the discriminator that corresponds to the skill policy. We propose
to leverage the skill discriminator to $\textit{match}$ the intrinsic and
downstream task rewards and determine the optimal skill for an unseen task
without environment samples, consequently finetuning with greater
sample-efficiency. Furthermore, we generalize IRM to sequence skills for
complex, long-horizon tasks and demonstrate that IRM enables us to utilize
pretrained skills far more effectively than previous skill selection methods on
both the Fetch tabletop and Franka Kitchen robot manipulation benchmarks.
Related papers
- RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations.
RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Learning Reward for Robot Skills Using Large Language Models via Self-Alignment [11.639973274337274]
Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions.
We propose a method to learn rewards more efficiently in the absence of humans.
arXiv Detail & Related papers (2024-05-12T04:57:43Z) - Learning to Schedule Online Tasks with Bandit Feedback [7.671139712158846]
Online task scheduling serves an integral role for task-intensive applications in cloud computing and crowdsourcing.
We propose a double-optimistic learning based Robbins-Monro (DOL-RM) algorithm.
DOL-RM integrates a learning module that incorporates optimistic estimation for reward-to-cost ratio and a decision module.
arXiv Detail & Related papers (2024-02-26T10:11:28Z) - APART: Diverse Skill Discovery using All Pairs with Ascending Reward and
DropouT [16.75358022780262]
We study diverse skill discovery in reward-free environments, aiming to discover all possible skills in simple grid-world environments.
This problem is formulated as mutual training of skills using an intrinsic reward and a discriminator trained to predict a skill given its trajectory.
Our initial solution replaces the standard one-vs-all (softmax) discriminator with a one-vs-one (all pairs) discriminator and combines it with a novel intrinsic reward function and a dropout regularization technique.
arXiv Detail & Related papers (2023-08-24T08:46:43Z) - Behavior Contrastive Learning for Unsupervised Skill Discovery [75.6190748711826]
We propose a novel unsupervised skill discovery method through contrastive learning among behaviors.
Under mild assumptions, our objective maximizes the MI between different behaviors based on the same skill.
Our method implicitly increases the state entropy to obtain better state coverage.
arXiv Detail & Related papers (2023-05-08T06:02:11Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning [27.69559938165733]
Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them.
We investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments.
Our experiments show that learning with a prior knowledge of useful skills can significantly improve the performance of agents on complex problems.
arXiv Detail & Related papers (2022-07-23T19:23:29Z) - Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration.
Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design.
We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z) - Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.