Curious Exploration and Return-based Memory Restoration for Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2105.00499v1
- Date: Sun, 2 May 2021 16:01:34 GMT
- Title: Curious Exploration and Return-based Memory Restoration for Deep
Reinforcement Learning
- Authors: Saeed Tafazzol, Erfan Fathi, Mahdi Rezaei, Ehsan Asali
- Abstract summary: In this paper, we focus on training a single agent to score goals with binary success/failure reward function.
The proposed method can be utilized to train agents in environments with fairly complex state and action spaces.
- Score: 2.3226893628361682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reward engineering and designing an incentive reward function are non-trivial
tasks to train agents in complex environments. Furthermore, an inaccurate
reward function may lead to a biased behaviour which is far from an efficient
and optimised behaviour. In this paper, we focus on training a single agent to
score goals with binary success/failure reward function in Half Field Offense
domain. As the major advantage of this research, the agent has no presumption
about the environment which means it only follows the original formulation of
reinforcement learning agents. The main challenge of using such a reward
function is the high sparsity of positive reward signals. To address this
problem, we use a simple prediction-based exploration strategy (called Curious
Exploration) along with a Return-based Memory Restoration (RMR) technique which
tends to remember more valuable memories. The proposed method can be utilized
to train agents in environments with fairly complex state and action spaces.
Our experimental results show that many recent solutions including our baseline
method fail to learn and perform in complex soccer domain. However, the
proposed method can converge easily to the nearly optimal behaviour. The video
presenting the performance of our trained agent is available at
http://bit.ly/HFO_Binary_Reward.
Related papers
- REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Embedding Contextual Information through Reward Shaping in Multi-Agent
Learning: A Case Study from Google Football [0.0]
We create a novel reward shaping method by embedding contextual information in reward function.
We demonstrate this in the Google Research Football (GRF) environment.
Experiment results prove that our reward shaping method is a useful addition to state-of-the-art MARL algorithms for training agents in environments with sparse reward signal.
arXiv Detail & Related papers (2023-03-25T10:21:13Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble [8.857776147129464]
Recovering reward function from expert demonstrations is a fundamental problem in reinforcement learning.
We present a dynamics-agnostic discriminator-ensemble reward learning method capable of learning both state-action and state-only reward functions.
arXiv Detail & Related papers (2022-06-01T05:16:39Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Adversarial Motion Priors Make Good Substitutes for Complex Reward
Functions [124.11520774395748]
Reinforcement learning practitioners often utilize complex reward functions that encourage physically plausible behaviors.
We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations.
A learned style reward can be combined with an arbitrary task reward to train policies that perform tasks using naturalistic strategies.
arXiv Detail & Related papers (2022-03-28T21:17:36Z) - Learning Long-Term Reward Redistribution via Randomized Return
Decomposition [18.47810850195995]
We consider the problem formulation of episodic reinforcement learning with trajectory feedback.
It refers to an extreme delay of reward signals, in which the agent can only obtain one reward signal at the end of each trajectory.
We propose a novel reward redistribution algorithm, randomized return decomposition (RRD), to learn a proxy reward function for episodic reinforcement learning.
arXiv Detail & Related papers (2021-11-26T13:23:36Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.