Related papers: Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football

Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football

URL: http://arxiv.org/abs/2303.15471v3
Date: Fri, 21 Jul 2023 17:20:16 GMT
Title: Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football
Authors: Chaoyi Gu, Varuna De Silva, Corentin Artaud, Rafael Pina
Abstract summary: We create a novel reward shaping method by embedding contextual information in reward function. We demonstrate this in the Google Research Football (GRF) environment. Experiment results prove that our reward shaping method is a useful addition to state-of-the-art MARL algorithms for training agents in environments with sparse reward signal.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Artificial Intelligence has been used to help human complete difficult tasks in complicated environments by providing optimized strategies for decision-making or replacing the manual labour. In environments including multiple agents, such as football, the most common methods to train agents are Imitation Learning and Multi-Agent Reinforcement Learning (MARL). However, the agents trained by Imitation Learning cannot outperform the expert demonstrator, which makes humans hardly get new insights from the learnt policy. Besides, MARL is prone to the credit assignment problem. In environments with sparse reward signal, this method can be inefficient. The objective of our research is to create a novel reward shaping method by embedding contextual information in reward function to solve the aforementioned challenges. We demonstrate this in the Google Research Football (GRF) environment. We quantify the contextual information extracted from game state observation and use this quantification together with original sparse reward to create the shaped reward. The experiment results in the GRF environment prove that our reward shaping method is a useful addition to state-of-the-art MARL algorithms for training agents in environments with sparse reward signal.

Related papers

RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently. Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward [7.51772160511614]
Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems. Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task. In this paper, we consider the case where the target task is mismatched from but similar with that of the expert. Existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards.
arXiv Detail & Related papers (2022-12-03T02:24:59Z)
Logic-based Reward Shaping for Multi-Agent Reinforcement Learning [1.5483078145498084]
Reinforcement learning relies heavily on exploration to learn from its environment and maximize observed rewards. Previous work has combined automata and logic based reward shaping with environment assumptions to provide an automatic mechanism to synthesize the reward function based on the task. This project explores how logic-based reward shaping for Multi-Agent Reinforcement Learning can be designed for different scenarios and tasks.
arXiv Detail & Related papers (2022-06-17T16:30:27Z)
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms. Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward. Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
Curious Exploration and Return-based Memory Restoration for Deep Reinforcement Learning [2.3226893628361682]
In this paper, we focus on training a single agent to score goals with binary success/failure reward function. The proposed method can be utilized to train agents in environments with fairly complex state and action spaces.
arXiv Detail & Related papers (2021-05-02T16:01:34Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations. We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z)
REMAX: Relational Representation for Multi-Agent Exploration [13.363887960136102]
We propose a learning-based exploration strategy to generate the initial states of a game. We demonstrate that our method improves the training and performance of the MARL model more than the existing exploration methods.
arXiv Detail & Related papers (2020-08-12T10:23:35Z)
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.