Adversarial Motion Priors Make Good Substitutes for Complex Reward
Functions
- URL: http://arxiv.org/abs/2203.15103v1
- Date: Mon, 28 Mar 2022 21:17:36 GMT
- Title: Adversarial Motion Priors Make Good Substitutes for Complex Reward
Functions
- Authors: Alejandro Escontrela, Xue Bin Peng, Wenhao Yu, Tingnan Zhang, Atil
Iscen, Ken Goldberg, and Pieter Abbeel
- Abstract summary: Reinforcement learning practitioners often utilize complex reward functions that encourage physically plausible behaviors.
We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations.
A learned style reward can be combined with an arbitrary task reward to train policies that perform tasks using naturalistic strategies.
- Score: 124.11520774395748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training a high-dimensional simulated agent with an under-specified reward
function often leads the agent to learn physically infeasible strategies that
are ineffective when deployed in the real world. To mitigate these unnatural
behaviors, reinforcement learning practitioners often utilize complex reward
functions that encourage physically plausible behaviors. However, a tedious
labor-intensive tuning process is often required to create hand-designed
rewards which might not easily generalize across platforms and tasks. We
propose substituting complex reward functions with "style rewards" learned from
a dataset of motion capture demonstrations. A learned style reward can be
combined with an arbitrary task reward to train policies that perform tasks
using naturalistic strategies. These natural strategies can also facilitate
transfer to the real world. We build upon Adversarial Motion Priors -- an
approach from the computer graphics domain that encodes a style reward from a
dataset of reference motions -- to demonstrate that an adversarial approach to
training policies can produce behaviors that transfer to a real quadrupedal
robot without requiring complex reward functions. We also demonstrate that an
effective style reward can be learned from a few seconds of motion capture data
gathered from a German Shepherd and leads to energy-efficient locomotion
strategies with natural gait transitions.
Related papers
- Infer and Adapt: Bipedal Locomotion Reward Learning from Demonstrations
via Inverse Reinforcement Learning [5.246548532908499]
This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains.
We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions.
We empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains.
arXiv Detail & Related papers (2023-09-28T00:11:06Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble [8.857776147129464]
Recovering reward function from expert demonstrations is a fundamental problem in reinforcement learning.
We present a dynamics-agnostic discriminator-ensemble reward learning method capable of learning both state-action and state-only reward functions.
arXiv Detail & Related papers (2022-06-01T05:16:39Z) - ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically
Simulated Characters [123.88692739360457]
General-purpose motor skills enable humans to perform complex tasks.
These skills also provide powerful priors for guiding their behaviors when learning new tasks.
We present a framework for learning versatile and reusable skill embeddings for physically simulated characters.
arXiv Detail & Related papers (2022-05-04T06:13:28Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Curious Exploration and Return-based Memory Restoration for Deep
Reinforcement Learning [2.3226893628361682]
In this paper, we focus on training a single agent to score goals with binary success/failure reward function.
The proposed method can be utilized to train agents in environments with fairly complex state and action spaces.
arXiv Detail & Related papers (2021-05-02T16:01:34Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z) - oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally
Extended Actions [37.66289166905027]
Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods.
We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
arXiv Detail & Related papers (2020-02-20T22:21:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.