Normative Disagreement as a Challenge for Cooperative AI
- URL: http://arxiv.org/abs/2111.13872v1
- Date: Sat, 27 Nov 2021 11:37:42 GMT
- Title: Normative Disagreement as a Challenge for Cooperative AI
- Authors: Julian Stastny, Maxime Rich\'e, Alexander Lyzhov, Johannes Treutlein,
Allan Dafoe, Jesse Clifton
- Abstract summary: We argue that typical cooperation-inducing learning algorithms fail to cooperate in bargaining problems.
We develop a class of norm-adaptive policies and show in experiments that these significantly increase cooperation.
- Score: 56.34005280792013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cooperation in settings where agents have both common and conflicting
interests (mixed-motive environments) has recently received considerable
attention in multi-agent learning. However, the mixed-motive environments
typically studied have a single cooperative outcome on which all agents can
agree. Many real-world multi-agent environments are instead bargaining problems
(BPs): they have several Pareto-optimal payoff profiles over which agents have
conflicting preferences. We argue that typical cooperation-inducing learning
algorithms fail to cooperate in BPs when there is room for normative
disagreement resulting in the existence of multiple competing cooperative
equilibria, and illustrate this problem empirically. To remedy the issue, we
introduce the notion of norm-adaptive policies. Norm-adaptive policies are
capable of behaving according to different norms in different circumstances,
creating opportunities for resolving normative disagreement. We develop a class
of norm-adaptive policies and show in experiments that these significantly
increase cooperation. However, norm-adaptiveness cannot address residual
bargaining failure arising from a fundamental tradeoff between exploitability
and cooperative robustness.
Related papers
- Role Play: Learning Adaptive Role-Specific Strategies in Multi-Agent Interactions [8.96091816092671]
We propose a novel framework called emphRole Play (RP)
RP employs role embeddings to transform the challenge of policy diversity into a more manageable diversity of roles.
It trains a common policy with role embedding observations and employs a role predictor to estimate the joint role embeddings of other agents, helping the learning agent adapt to its assigned role.
arXiv Detail & Related papers (2024-11-02T07:25:48Z) - Learning and Sustaining Shared Normative Systems via Bayesian Rule
Induction in Markov Games [2.307051163951559]
We build learning agents that cooperate flexibly with the human institutions they are embedded in.
By assuming shared norms, a newly introduced agent can infer the norms of an existing population from observations of compliance and violation.
Since agents can bootstrap common knowledge of the norms, this leads the norms to be widely adhered to, enabling new entrants to rapidly learn those norms.
arXiv Detail & Related papers (2024-02-20T21:58:40Z) - Policy Diversity for Cooperative Agents [8.689289576285095]
Multi-agent reinforcement learning aims to find the optimal team cooperative policy to complete a task.
There may exist multiple different ways of cooperating, which usually are very needed by domain experts.
Unfortunately, there is a general lack of effective policy diversity approaches specifically designed for the multi-agent domain.
arXiv Detail & Related papers (2023-08-28T05:23:16Z) - Adaptive Value Decomposition with Greedy Marginal Contribution
Computation for Cooperative Multi-Agent Reinforcement Learning [48.41925886860991]
Real-world cooperation often requires intensive coordination among agents simultaneously.
Traditional methods that learn the value function as a monotonic mixing of per-agent utilities cannot solve the tasks with non-monotonic returns.
We propose a novel explicit credit assignment method to address the non-monotonic problem.
arXiv Detail & Related papers (2023-02-14T07:23:59Z) - Stateful active facilitator: Coordination and Environmental
Heterogeneity in Cooperative Multi-Agent Reinforcement Learning [71.53769213321202]
We formalize the notions of coordination level and heterogeneity level of an environment.
We present HECOGrid, a suite of multi-agent environments that facilitates empirical evaluation of different MARL approaches.
We propose a Training Decentralized Execution learning approach that enables agents to work efficiently in high-coordination and high-heterogeneity environments.
arXiv Detail & Related papers (2022-10-04T18:17:01Z) - Iterated Reasoning with Mutual Information in Cooperative and Byzantine
Decentralized Teaming [0.0]
We show that reformulating an agent's policy to be conditional on the policies of its teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG)
Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks.
arXiv Detail & Related papers (2022-01-20T22:54:32Z) - Balancing Rational and Other-Regarding Preferences in
Cooperative-Competitive Environments [4.705291741591329]
Mixed environments are notorious for the conflicts of selfish and social interests.
We propose BAROCCO to balance individual and social incentives.
Our meta-algorithm is compatible with both Q-learning and Actor-Critic frameworks.
arXiv Detail & Related papers (2021-02-24T14:35:32Z) - Dealing with Non-Stationarity in Multi-Agent Reinforcement Learning via
Trust Region Decomposition [52.06086375833474]
Non-stationarity is one thorny issue in multi-agent reinforcement learning.
We introduce a $delta$-stationarity measurement to explicitly model the stationarity of a policy sequence.
We propose a trust region decomposition network based on message passing to estimate the joint policy divergence.
arXiv Detail & Related papers (2021-02-21T14:46:50Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Non-local Policy Optimization via Diversity-regularized Collaborative
Exploration [45.997521480637836]
We propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE)
DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences.
We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines.
arXiv Detail & Related papers (2020-06-14T03:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.