Related papers: Tackling Asymmetric and Circular Sequential Social Dilemmas with Reinforcement Learning and Graph-based Tit-for-Tat

Tackling Asymmetric and Circular Sequential Social Dilemmas with Reinforcement Learning and Graph-based Tit-for-Tat

URL: http://arxiv.org/abs/2206.12909v1
Date: Sun, 26 Jun 2022 15:42:48 GMT
Title: Tackling Asymmetric and Circular Sequential Social Dilemmas with Reinforcement Learning and Graph-based Tit-for-Tat
Authors: Tangui Le Gl\'eau, Xavier Marjou, Tayeb Lemlouma, Benoit Radier
Abstract summary: Social dilemmas offer situations where multiple actors should all cooperate to achieve the best outcome but greed and fear lead to a worst self-interested issue. Recently, the emergence of Deep Reinforcement Learning has generated revived interest in social dilemmas with the introduction of Sequential Social Dilemma (SSD) This paper extends SSD with Circular Sequential Social Dilemma (CSSD), a new kind of Markov games that better generalizes the diversity of cooperation between agents.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many societal and industrial interactions, participants generally prefer their pure self-interest at the expense of the global welfare. Known as social dilemmas, this category of non-cooperative games offers situations where multiple actors should all cooperate to achieve the best outcome but greed and fear lead to a worst self-interested issue. Recently, the emergence of Deep Reinforcement Learning (RL) has generated revived interest in social dilemmas with the introduction of Sequential Social Dilemma (SSD). Cooperative agents mixing RL policies and Tit-for-tat (TFT) strategies have successfully addressed some non-optimal Nash equilibrium issues. However, this kind of paradigm requires symmetrical and direct cooperation between actors, conditions that are not met when mutual cooperation become asymmetric and is possible only with at least a third actor in a circular way. To tackle this issue, this paper extends SSD with Circular Sequential Social Dilemma (CSSD), a new kind of Markov games that better generalizes the diversity of cooperation between agents. Secondly, to address such circular and asymmetric cooperation, we propose a candidate solution based on RL policies and a graph-based TFT. We conducted some experiments on a simple multi-player grid world which offers adaptable cooperation structures. Our work confirmed that our graph-based approach is beneficial to address circular situations by encouraging self-interested agents to reach mutual cooperation.

Related papers

Learning to Balance Altruism and Self-interest Based on Empathy in Mixed-Motive Games [47.8980880888222]
Multi-agent scenarios often involve mixed motives, demanding altruistic agents capable of self-protection against potential exploitation. We propose LASE Learning to balance Altruism and Self-interest based on Empathy. LASE allocates a portion of its rewards to co-players as gifts, with this allocation adapting dynamically based on the social relationship.
arXiv Detail & Related papers (2024-10-10T12:30:56Z)
Principal-Agent Reinforcement Learning: Orchestrating AI Agents with Contracts [20.8288955218712]
We propose a framework where a principal guides an agent in a Markov Decision Process (MDP) using a series of contracts. We present and analyze a meta-algorithm that iteratively optimize the policies of the principal and agent. We then scale our algorithm with deep Q-learning and analyze its convergence in the presence of approximation error.
arXiv Detail & Related papers (2024-07-25T14:28:58Z)
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents [101.17919953243107]
GovSim is a generative simulation platform designed to study strategic interactions and cooperative decision-making in large language models (LLMs) We find that all but the most powerful LLM agents fail to achieve a sustainable equilibrium in GovSim, with the highest survival rate below 54%. We show that agents that leverage "Universalization"-based reasoning, a theory of moral thinking, are able to achieve significantly better sustainability.
arXiv Detail & Related papers (2024-04-25T15:59:16Z)
Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination [36.33334853998621]
We introduce the Cooperative Open-ended LEarning (COLE) framework to solve cooperative incompatibility in learning. COLE formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy. We show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis.
arXiv Detail & Related papers (2023-06-05T16:51:38Z)
Adaptive Coordination in Social Embodied Rearrangement [49.35582108902819]
We study zero-shot coordination (ZSC) in this task, where an agent collaborates with a new partner, emulating a scenario where a robot collaborates with a new human partner. We propose Behavior Diversity Play (BDP), a novel ZSC approach that encourages diversity through a discriminability objective. Our results demonstrate that BDP learns adaptive agents that can tackle visual coordination, and zero-shot generalize to new partners in unseen environments, achieving 35% higher success and 32% higher efficiency compared to baselines.
arXiv Detail & Related papers (2023-05-31T18:05:51Z)
Learning Roles with Emergent Social Value Orientations [49.16026283952117]
This paper introduces the typical "division of labor or roles" mechanism in human society. We provide a promising solution for intertemporal social dilemmas (ISD) with social value orientations (SVO) A novel learning framework, called Learning Roles with Emergent SVOs (RESVO), is proposed to transform the learning of roles into the social value orientation emergence.
arXiv Detail & Related papers (2023-01-31T17:54:09Z)
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z)
Normative Disagreement as a Challenge for Cooperative AI [56.34005280792013]
We argue that typical cooperation-inducing learning algorithms fail to cooperate in bargaining problems. We develop a class of norm-adaptive policies and show in experiments that these significantly increase cooperation.
arXiv Detail & Related papers (2021-11-27T11:37:42Z)
Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL [20.22747008079794]
We study the dynamics of a second-order social dilemma resulting from incentivizing mechanisms. We find that a typical tendency of humans, called homophily, can solve the problem. We propose a novel learning framework to encourage incentive homophily.
arXiv Detail & Related papers (2021-04-23T08:00:45Z)
Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive Environments [4.705291741591329]
Mixed environments are notorious for the conflicts of selfish and social interests. We propose BAROCCO to balance individual and social incentives. Our meta-algorithm is compatible with both Q-learning and Actor-Critic frameworks.
arXiv Detail & Related papers (2021-02-24T14:35:32Z)
Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences [8.10414043447031]
We show evidence of emergent direct reciprocity, indirect reciprocity and reputation, and team formation when training agents with randomized uncertain social preferences (RUSP) RUSP is generic and scalable; it can be applied to any multi-agent environment without changing the original underlying game dynamics or objectives. In particular, we show that with RUSP these behaviors can emerge and lead to higher social welfare equilibria in both classic abstract social dilemmas like Iterated Prisoner's Dilemma as well in more complex intertemporal environments.
arXiv Detail & Related papers (2020-11-10T20:06:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.