Reward function shape exploration in adversarial imitation learning: an
empirical study
- URL: http://arxiv.org/abs/2104.06687v1
- Date: Wed, 14 Apr 2021 08:21:49 GMT
- Title: Reward function shape exploration in adversarial imitation learning: an
empirical study
- Authors: Yawei Wang and Xiu Li
- Abstract summary: In adversarial imitation learning algorithms (AILs), no true rewards are obtained from the environment for learning the strategy.
We design several representative reward function shapes and compare their performances by large-scale experiments.
- Score: 9.817069267241575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For adversarial imitation learning algorithms (AILs), no true rewards are
obtained from the environment for learning the strategy. However, the pseudo
rewards based on the output of the discriminator are still required. Given the
implicit reward bias problem in AILs, we design several representative reward
function shapes and compare their performances by large-scale experiments. To
ensure our results' reliability, we conduct the experiments on a series of
Mujoco and Box2D continuous control tasks based on four different AILs.
Besides, we also compare the performance of various reward function shapes
using varying numbers of expert trajectories. The empirical results reveal that
the positive logarithmic reward function works well in typical continuous
control tasks. In contrast, the so-called unbiased reward function is limited
to specific kinds of tasks. Furthermore, several designed reward functions
perform excellently in these environments as well.
Related papers
- DreamSmooth: Improving Model-based Reinforcement Learning via Reward
Smoothing [60.21269454707625]
DreamSmooth learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep.
We show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks.
arXiv Detail & Related papers (2023-11-02T17:57:38Z) - Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback? [10.968490626773564]
We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs)
Our experiments across several domains, including CartPole, Visual Gridworld environments and Atari games, provide evidence that the tree structure of our learned reward function is useful in determining the extent to which the reward function is aligned with human preferences.
arXiv Detail & Related papers (2023-06-22T16:04:16Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Identifiability and generalizability from multiple experts in Inverse
Reinforcement Learning [39.632717308147825]
Reinforcement Learning (RL) aims to train an agent from a reward function in a given environment.
Inverse Reinforcement Learning (IRL) seeks to recover the reward function from observing an expert's behavior.
arXiv Detail & Related papers (2022-09-22T12:50:00Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Invariance in Policy Optimisation and Partial Identifiability in Reward
Learning [67.4640841144101]
We characterise the partial identifiability of the reward function given popular reward learning data sources.
We also analyse the impact of this partial identifiability for several downstream tasks, such as policy optimisation.
arXiv Detail & Related papers (2022-03-14T20:19:15Z) - Dynamics-Aware Comparison of Learned Reward Functions [21.159457412742356]
The ability to learn reward functions plays an important role in enabling the deployment of intelligent agents in the real world.
Reward functions are typically compared by considering the behavior of optimized policies, but this approach conflates deficiencies in the reward function with those of the policy search algorithm used to optimize it.
We propose the Dynamics-Aware Reward Distance (DARD), a new reward pseudometric.
arXiv Detail & Related papers (2022-01-25T03:48:00Z) - Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification [133.20816939521941]
In the standard Markov decision process formalism, users specify tasks by writing down a reward function.
In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved.
Motivated by this observation, we derive a control algorithm that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states.
arXiv Detail & Related papers (2021-03-23T16:19:55Z) - Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping [71.214923471669]
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL)
In this paper, we consider the problem of adaptively utilizing a given shaping reward function.
Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards.
arXiv Detail & Related papers (2020-11-05T05:34:14Z) - Addressing reward bias in Adversarial Imitation Learning with neutral
reward functions [1.7188280334580197]
Imitation Learning suffers from the fundamental problem of reward bias stemming from the choice of reward functions used in the algorithm.
We provide a theoretical sketch of why existing reward functions would fail in imitation learning scenarios in task based environments with multiple terminal states.
We propose a new reward function for GAIL which outperforms existing GAIL methods on task based environments with single and multiple terminal states.
arXiv Detail & Related papers (2020-09-20T16:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.