Intrinsic fluctuations of reinforcement learning promote cooperation
- URL: http://arxiv.org/abs/2209.01013v1
- Date: Thu, 1 Sep 2022 09:14:47 GMT
- Title: Intrinsic fluctuations of reinforcement learning promote cooperation
- Authors: Wolfram Barfuss and Janusz Meylahn
- Abstract summary: Cooperating in social dilemma situations is vital for animals, humans, and machines.
We demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, we ask for and answer what makes classical reinforcement
learning cooperative. Cooperating in social dilemma situations is vital for
animals, humans, and machines. While evolutionary theory revealed a range of
mechanisms promoting cooperation, the conditions under which agents learn to
cooperate are contested. Here, we demonstrate which and how individual elements
of the multi-agent learning setting lead to cooperation. Specifically, we
consider the widely used temporal-difference reinforcement learning algorithm
with epsilon-greedy exploration in the classic environment of an iterated
Prisoner's dilemma with one-period memory. Each of the two learning agents
learns a strategy that conditions the following action choices on both agents'
action choices of the last round. We find that next to a high caring for future
rewards, a low exploration rate, and a small learning rate, it is primarily
intrinsic stochastic fluctuations of the reinforcement learning process which
double the final rate of cooperation to up to 80\%. Thus, inherent noise is not
a necessary evil of the iterative learning process. It is a critical asset for
the learning of cooperation. However, we also point out the trade-off between a
high likelihood of cooperative behavior and achieving this in a reasonable
amount of time. Our findings are relevant for purposefully designing
cooperative algorithms and regulating undesired collusive effects.
Related papers
- Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents [2.1301560294088318]
Cooperation between self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents.
We introduce Reciprocators, reinforcement learning agents which are intrinsically motivated to reciprocate the influence of opponents' actions on their returns.
We show that Reciprocators can be used to promote cooperation in temporally extended social dilemmas during simultaneous learning.
arXiv Detail & Related papers (2024-06-03T06:07:27Z) - Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server.
We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z) - Deconstructing Cooperation and Ostracism via Multi-Agent Reinforcement
Learning [3.3751859064985483]
We show that network rewiring facilitates mutual cooperation even when one agent always offers cooperation.
We also find that ostracism alone is not sufficient to make cooperation emerge.
Our findings provide insights into the conditions and mechanisms necessary for the emergence of cooperation.
arXiv Detail & Related papers (2023-10-06T23:18:55Z) - Coach-assisted Multi-Agent Reinforcement Learning Framework for
Unexpected Crashed Agents [120.91291581594773]
We present a formal formulation of a cooperative multi-agent reinforcement learning system with unexpected crashes.
We propose a coach-assisted multi-agent reinforcement learning framework, which introduces a virtual coach agent to adjust the crash rate during training.
To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system.
arXiv Detail & Related papers (2022-03-16T08:22:45Z) - Improved cooperation by balancing exploration and exploitation in
intertemporal social dilemma tasks [2.541277269153809]
We propose a new learning strategy for achieving coordination by incorporating a learning rate that can balance exploration and exploitation.
We show that agents that use the simple strategy improve a relatively collective return in a decision task called the intertemporal social dilemma.
We also explore the effects of the diversity of learning rates on the population of reinforcement learning agents and show that agents trained in heterogeneous populations develop particularly coordinated policies.
arXiv Detail & Related papers (2021-10-19T08:40:56Z) - Joint Attention for Multi-Agent Coordination and Social Learning [108.31232213078597]
We show that joint attention can be useful as a mechanism for improving multi-agent coordination and social learning.
Joint attention leads to higher performance than a competitive centralized critic baseline across multiple environments.
Taken together, these findings suggest that joint attention may be a useful inductive bias for multi-agent learning.
arXiv Detail & Related papers (2021-04-15T20:14:19Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - Multi-Issue Bargaining With Deep Reinforcement Learning [0.0]
This paper evaluates the use of deep reinforcement learning in bargaining games.
Two actor-critic networks were trained for the bidding and acceptance strategy.
Neural agents learn to exploit time-based agents, achieving clear transitions in decision preference values.
They also demonstrate adaptive behavior against different combinations of concession, discount factors, and behavior-based strategies.
arXiv Detail & Related papers (2020-02-18T18:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.