Inducing Cooperative behaviour in Sequential-Social dilemmas through
Multi-Agent Reinforcement Learning using Status-Quo Loss
- URL: http://arxiv.org/abs/2001.05458v2
- Date: Thu, 13 Feb 2020 09:55:17 GMT
- Title: Inducing Cooperative behaviour in Sequential-Social dilemmas through
Multi-Agent Reinforcement Learning using Status-Quo Loss
- Authors: Pinkesh Badjatiya, Mausoom Sarkar, Abhishek Sinha, Siddharth Singh,
Nikaash Puri, Jayakumar Subramanian, Balaji Krishnamurthy
- Abstract summary: In social dilemma situations, individual rationality leads to sub-optimal group outcomes.
Deep Reinforcement Learning agents trained to optimize individual rewards converge to selfish, mutually harmful behavior.
We show how agents trained with SQLoss evolve cooperative behavior in several social dilemma matrix games.
- Score: 16.016452248865132
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In social dilemma situations, individual rationality leads to sub-optimal
group outcomes. Several human engagements can be modeled as a sequential
(multi-step) social dilemmas. However, in contrast to humans, Deep
Reinforcement Learning agents trained to optimize individual rewards in
sequential social dilemmas converge to selfish, mutually harmful behavior. We
introduce a status-quo loss (SQLoss) that encourages an agent to stick to the
status quo, rather than repeatedly changing its policy. We show how agents
trained with SQLoss evolve cooperative behavior in several social dilemma
matrix games. To work with social dilemma games that have visual input, we
propose GameDistill. GameDistill uses self-supervision and clustering to
automatically extract cooperative and selfish policies from a social dilemma
game. We combine GameDistill and SQLoss to show how agents evolve socially
desirable cooperative behavior in the Coin Game.
Related papers
- Learning to Balance Altruism and Self-interest Based on Empathy in Mixed-Motive Games [47.8980880888222]
Multi-agent scenarios often involve mixed motives, demanding altruistic agents capable of self-protection against potential exploitation.
We propose LASE Learning to balance Altruism and Self-interest based on Empathy.
LASE allocates a portion of its rewards to co-players as gifts, with this allocation adapting dynamically based on the social relationship.
arXiv Detail & Related papers (2024-10-10T12:30:56Z) - Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - Incorporating Rivalry in Reinforcement Learning for a Competitive Game [65.2200847818153]
This work proposes a novel reinforcement learning mechanism based on the social impact of rivalry behavior.
Our proposed model aggregates objective and social perception mechanisms to derive a rivalry score that is used to modulate the learning of artificial agents.
arXiv Detail & Related papers (2022-08-22T14:06:06Z) - Tackling Asymmetric and Circular Sequential Social Dilemmas with
Reinforcement Learning and Graph-based Tit-for-Tat [0.0]
Social dilemmas offer situations where multiple actors should all cooperate to achieve the best outcome but greed and fear lead to a worst self-interested issue.
Recently, the emergence of Deep Reinforcement Learning has generated revived interest in social dilemmas with the introduction of Sequential Social Dilemma (SSD)
This paper extends SSD with Circular Sequential Social Dilemma (CSSD), a new kind of Markov games that better generalizes the diversity of cooperation between agents.
arXiv Detail & Related papers (2022-06-26T15:42:48Z) - Aligning to Social Norms and Values in Interactive Narratives [89.82264844526333]
We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games.
We introduce the GALAD agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values.
arXiv Detail & Related papers (2022-05-04T09:54:33Z) - Cooperative Artificial Intelligence [0.0]
We argue that there is a need for research on the intersection between game theory and artificial intelligence.
We discuss the problem of how an external agent can promote cooperation between artificial learners.
We show that the resulting cooperative outcome is stable in certain games even if the planning agent is turned off.
arXiv Detail & Related papers (2022-02-20T16:50:37Z) - Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria [57.74495091445414]
Social deduction games offer an avenue to study how individuals might learn to synthesize potentially unreliable information about others.
In this work, we present Hidden Agenda, a two-team social deduction game that provides a 2D environment for studying learning agents in scenarios of unknown team alignment.
Reinforcement learning agents trained in Hidden Agenda show that agents can learn a variety of behaviors, including partnering and voting without need for communication in natural language.
arXiv Detail & Related papers (2022-01-05T20:54:10Z) - Emergent Prosociality in Multi-Agent Games Through Gifting [14.943238230772264]
Reinforcement learning algorithms often suffer from converging to socially less desirable equilibria when multiple equilibria exist.
We propose using a less restrictive peer-rewarding mechanism, gifting, that guides the agents toward more socially desirable equilibria.
We employ a theoretical framework that captures the benefit of gifting in converging to the prosocial equilibrium.
arXiv Detail & Related papers (2021-05-13T23:28:30Z) - Policy Fusion for Adaptive and Customizable Reinforcement Learning
Agents [137.86426963572214]
We show how to combine distinct behavioral policies to obtain a meaningful "fusion" policy.
We propose four different policy fusion methods for combining pre-trained policies.
We provide several practical examples and use-cases for how these methods are indeed useful for video game production and designers.
arXiv Detail & Related papers (2021-04-21T16:08:44Z) - Emergent Reciprocity and Team Formation from Randomized Uncertain Social
Preferences [8.10414043447031]
We show evidence of emergent direct reciprocity, indirect reciprocity and reputation, and team formation when training agents with randomized uncertain social preferences (RUSP)
RUSP is generic and scalable; it can be applied to any multi-agent environment without changing the original underlying game dynamics or objectives.
In particular, we show that with RUSP these behaviors can emerge and lead to higher social welfare equilibria in both classic abstract social dilemmas like Iterated Prisoner's Dilemma as well in more complex intertemporal environments.
arXiv Detail & Related papers (2020-11-10T20:06:19Z) - Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior [27.80555922579736]
We study the behaviors of online learning algorithms in the Iterated Prisoner's Dilemma (IPD) game.
We evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion.
Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game.
arXiv Detail & Related papers (2020-06-09T15:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.