Embedding Contextual Information through Reward Shaping in Multi-Agent
Learning: A Case Study from Google Football
- URL: http://arxiv.org/abs/2303.15471v3
- Date: Fri, 21 Jul 2023 17:20:16 GMT
- Title: Embedding Contextual Information through Reward Shaping in Multi-Agent
Learning: A Case Study from Google Football
- Authors: Chaoyi Gu, Varuna De Silva, Corentin Artaud, Rafael Pina
- Abstract summary: We create a novel reward shaping method by embedding contextual information in reward function.
We demonstrate this in the Google Research Football (GRF) environment.
Experiment results prove that our reward shaping method is a useful addition to state-of-the-art MARL algorithms for training agents in environments with sparse reward signal.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial Intelligence has been used to help human complete difficult tasks
in complicated environments by providing optimized strategies for
decision-making or replacing the manual labour. In environments including
multiple agents, such as football, the most common methods to train agents are
Imitation Learning and Multi-Agent Reinforcement Learning (MARL). However, the
agents trained by Imitation Learning cannot outperform the expert demonstrator,
which makes humans hardly get new insights from the learnt policy. Besides,
MARL is prone to the credit assignment problem. In environments with sparse
reward signal, this method can be inefficient. The objective of our research is
to create a novel reward shaping method by embedding contextual information in
reward function to solve the aforementioned challenges. We demonstrate this in
the Google Research Football (GRF) environment. We quantify the contextual
information extracted from game state observation and use this quantification
together with original sparse reward to create the shaped reward. The
experiment results in the GRF environment prove that our reward shaping method
is a useful addition to state-of-the-art MARL algorithms for training agents in
environments with sparse reward signal.
Related papers
- Reinforcement learning with Demonstrations from Mismatched Task under
Sparse Reward [7.51772160511614]
Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems.
Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task.
In this paper, we consider the case where the target task is mismatched from but similar with that of the expert.
Existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards.
arXiv Detail & Related papers (2022-12-03T02:24:59Z) - Logic-based Reward Shaping for Multi-Agent Reinforcement Learning [1.5483078145498084]
Reinforcement learning relies heavily on exploration to learn from its environment and maximize observed rewards.
Previous work has combined automata and logic based reward shaping with environment assumptions to provide an automatic mechanism to synthesize the reward function based on the task.
This project explores how logic-based reward shaping for Multi-Agent Reinforcement Learning can be designed for different scenarios and tasks.
arXiv Detail & Related papers (2022-06-17T16:30:27Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Curious Exploration and Return-based Memory Restoration for Deep
Reinforcement Learning [2.3226893628361682]
In this paper, we focus on training a single agent to score goals with binary success/failure reward function.
The proposed method can be utilized to train agents in environments with fairly complex state and action spaces.
arXiv Detail & Related papers (2021-05-02T16:01:34Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - REMAX: Relational Representation for Multi-Agent Exploration [13.363887960136102]
We propose a learning-based exploration strategy to generate the initial states of a game.
We demonstrate that our method improves the training and performance of the MARL model more than the existing exploration methods.
arXiv Detail & Related papers (2020-08-12T10:23:35Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.