Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards
for Real-time Strategy Games
- URL: http://arxiv.org/abs/2010.03956v1
- Date: Mon, 5 Oct 2020 03:43:06 GMT
- Title: Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards
for Real-time Strategy Games
- Authors: Shengyi Huang, Santiago Onta\~n\'on
- Abstract summary: Training agents using Reinforcement Learning in games with sparse rewards is a challenging problem.
We present a novel technique that successfully trains agents to eventually optimize the true objective in games with sparse rewards.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training agents using Reinforcement Learning in games with sparse rewards is
a challenging problem, since large amounts of exploration are required to
retrieve even the first reward. To tackle this problem, a common approach is to
use reward shaping to help exploration. However, an important drawback of
reward shaping is that agents sometimes learn to optimize the shaped reward
instead of the true objective. In this paper, we present a novel technique that
we call action guidance that successfully trains agents to eventually optimize
the true objective in games with sparse rewards while maintaining most of the
sample efficiency that comes with reward shaping. We evaluate our approach in a
simplified real-time strategy (RTS) game simulator called $\mu$RTS.
Related papers
- Reward Shaping for Improved Learning in Real-time Strategy Game Play [0.3347089492811693]
We show that appropriately designed reward shaping functions can significantly improve the player's performance.
We have validated our reward shaping functions within a simulated environment for playing a marine capture-the-flag game.
arXiv Detail & Related papers (2023-11-27T21:56:18Z) - DreamSmooth: Improving Model-based Reinforcement Learning via Reward
Smoothing [60.21269454707625]
DreamSmooth learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep.
We show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks.
arXiv Detail & Related papers (2023-11-02T17:57:38Z) - Redeeming Intrinsic Rewards via Constrained Optimization [17.203887958936168]
State-of-the-art reinforcement learning (RL) algorithms typically use random sampling (e.g., $epsilon$-greedy) for exploration, but this method fails in hard exploration tasks like Montezuma's Revenge.
Prior works incentivize the agent to visit novel states using an exploration bonus (also called an intrinsic reward or curiosity)
Such methods can lead to excellent results on hard exploration tasks but can suffer from intrinsic reward bias and underperform when compared to an agent trained using only task rewards.
We propose a principled constrained policy optimization procedure that automatically tunes the importance of the intrinsic reward.
arXiv Detail & Related papers (2022-11-14T18:49:26Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Handling Sparse Rewards in Reinforcement Learning Using Model Predictive
Control [9.118706387430883]
Reinforcement learning (RL) has recently proven great success in various domains.
Yet, the design of the reward function requires detailed domain expertise and tedious fine-tuning to ensure that agents are able to learn the desired behaviour.
We propose to use model predictive control(MPC) as an experience source for training RL agents in sparse reward environments.
arXiv Detail & Related papers (2022-10-04T11:06:38Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Automatic Reward Design via Learning Motivation-Consistent Intrinsic
Rewards [46.068337522093096]
We introduce the concept of motivation which captures the underlying goal of maximizing certain rewards.
Our method performs better than the state-of-the-art methods in handling problems of delayed reward, exploration, and credit assignment.
arXiv Detail & Related papers (2022-07-29T14:52:02Z) - Adversarial Motion Priors Make Good Substitutes for Complex Reward
Functions [124.11520774395748]
Reinforcement learning practitioners often utilize complex reward functions that encourage physically plausible behaviors.
We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations.
A learned style reward can be combined with an arbitrary task reward to train policies that perform tasks using naturalistic strategies.
arXiv Detail & Related papers (2022-03-28T21:17:36Z) - Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL [91.26538493552817]
We present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward.
We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments.
arXiv Detail & Related papers (2021-12-02T00:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.