Credit-cognisant reinforcement learning for multi-agent cooperation
- URL: http://arxiv.org/abs/2211.10100v1
- Date: Fri, 18 Nov 2022 09:00:25 GMT
- Title: Credit-cognisant reinforcement learning for multi-agent cooperation
- Authors: F. Bredell, H. A. Engelbrecht, J. C. Schoeman
- Abstract summary: We introduce the concept of credit-cognisant rewards, which allows an agent to perceive the effect its actions had on the environment as well as on its co-agents.
We show that by manipulating these experiences and constructing the reward contained within them to include the rewards received by all the agents within the same action sequence, we are able to improve significantly on the performance of independent deep Q-learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Traditional multi-agent reinforcement learning (MARL) algorithms, such as
independent Q-learning, struggle when presented with partially observable
scenarios, and where agents are required to develop delicate action sequences.
This is often the result of the reward for a good action only being available
after other agents have taken theirs, and these actions are not credited
accordingly. Recurrent neural networks have proven to be a viable solution
strategy for solving these types of problems, resulting in significant
performance increase when compared to other methods. In this paper, we explore
a different approach and focus on the experiences used to update the
action-value functions of each agent. We introduce the concept of
credit-cognisant rewards (CCRs), which allows an agent to perceive the effect
its actions had on the environment as well as on its co-agents. We show that by
manipulating these experiences and constructing the reward contained within
them to include the rewards received by all the agents within the same action
sequence, we are able to improve significantly on the performance of
independent deep Q-learning as well as deep recurrent Q-learning. We evaluate
and test the performance of CCRs when applied to deep reinforcement learning
techniques at the hands of a simplified version of the popular card game
Hanabi.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z) - Episodic Multi-agent Reinforcement Learning with Curiosity-Driven
Exploration [40.87053312548429]
We introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC.
We use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training.
arXiv Detail & Related papers (2021-11-22T07:34:47Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm"
We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.