Related papers: Melting Pot 2.0

Melting Pot 2.0

URL: http://arxiv.org/abs/2211.13746v6
Date: Tue, 31 Oct 2023 00:06:14 GMT
Title: Melting Pot 2.0
Authors: John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Du\'e\~nez-Guzm\'an, Jayd Matyas, Yiran Mao, Peter Sunehag, Raphael K\"oster, Udari Madhushani, Kavya Kopparapu, Ramona Comanescu, DJ Strouse, Michael B. Johanson, Sukhdeep Singh, Julia Haas, Igor Mordatch, Dean Mobbs, Joel Z. Leibo
Abstract summary: Melting Pot is a tool developed to facilitate work on multi-agent artificial intelligence. It provides an evaluation protocol that measures generalization to novel social partners. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives.
Score: 54.60680281014163
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a "substrate") with a reference set of co-players (a "background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.

Related papers

Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia [100.74015791021044]
Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction.<n>Existing evaluation methods fail to measure how well these capabilities generalize to novel social situations.<n>We present empirical results from the NeurIPS 2024 Concordia Contest, where agents were evaluated on their ability to achieve mutual gains.
arXiv Detail & Related papers (2025-12-03T00:11:05Z)
Embedded Universal Predictive Intelligence: a coherent framework for multi-agent learning [57.23345786304694]
We introduce a framework for prospective learning and embedded agency centered on self-prediction.<n>We show that in multi-agent settings, self-prediction enables agents to reason about others running similar algorithms.<n>We extend the theory of AIXI, and study universally intelligent embedded agents which start from a Solomonoff prior.
arXiv Detail & Related papers (2025-11-27T08:46:48Z)
The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation [0.16921396880325779]
We introduce a novel evaluation framework that uses multi-agent debate as a controlled "social laboratory"<n>We show that assigned personas induce stable, measurable psychometric profiles, particularly in cognitive effort.<n>This work provides a blueprint for a new class of dynamic, psychometrically grounded evaluation protocols.
arXiv Detail & Related papers (2025-10-01T07:10:28Z)
Beyond the high score: Prosocial ability profiles of multi-agent populations [7.740015167057365]
The Melting Pot contest is a social AI evaluation suite designed to assess the cooperation capabilities of AI systems.<n>We apply a Bayesian approach known as Measurement Layouts to infer the capability profiles of multi-agent systems in the Melting Pot contest.<n>We show that these capability profiles not only predict future performance within the Melting Pot suite but also reveal the underlying prosocial abilities of agents.
arXiv Detail & Related papers (2025-09-17T23:29:39Z)
Don't lie to your friends: Learning what you know from collaborative self-play [90.35507959579331]
We propose a radically new approach to teaching AI agents what they know. We construct multi-agent collaborations in which the group is rewarded for collectively arriving at correct answers. The desired meta-knowledge emerges from the incentives built into the structure of the interaction.
arXiv Detail & Related papers (2025-03-18T17:53:20Z)
Factorised Active Inference for Strategic Multi-Agent Interactions [1.9389881806157316]
Two complementary approaches can be integrated to this end. The Active Inference framework (AIF) describes how agents employ a generative model to adapt their beliefs about and behaviour within their environment. Game theory formalises strategic interactions between agents with potentially competing objectives. We propose a factorisation of the generative model whereby each agent maintains explicit, individual-level beliefs about the internal states of other agents, and uses them for strategic planning in a joint context.
arXiv Detail & Related papers (2024-11-11T21:04:43Z)
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios [38.878966229688054]
We introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios. Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts. We analyze goals using ERG theory and conduct comprehensive experiments. Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning.
arXiv Detail & Related papers (2024-10-25T07:04:16Z)
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models [4.9108308035618515]
Multi-agent reinforcement learning (MARL) methods struggle with the non-stationarity of multi-agent systems. Here, we leverage large language models (LLMs) to create an autonomous agent that can handle these challenges. Our agent, Hypothetical Minds, consists of a cognitively-inspired architecture, featuring modular components for perception, memory, and hierarchical planning over two levels of abstraction.
arXiv Detail & Related papers (2024-07-09T17:57:15Z)
AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored. We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing. As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework. This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Neural Amortized Inference for Nested Multi-agent Reasoning [54.39127942041582]
We propose a novel approach to bridge the gap between human-like inference capabilities and computational limitations. We evaluate our method in two challenging multi-agent interaction domains.
arXiv Detail & Related papers (2023-08-21T22:40:36Z)
Theory of Mind as Intrinsic Motivation for Multi-Agent Reinforcement Learning [5.314466196448188]
We present a method of grounding semantically meaningful, human-interpretable beliefs within policies modeled by deep networks. We propose that ability of each agent to predict the beliefs of the other agents can be used as an intrinsic reward signal for multi-agent reinforcement learning.
arXiv Detail & Related papers (2023-07-03T17:07:18Z)
ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind [18.85252946546942]
Theory of Mind (ToM) builds socially intelligent agents who are able to communicate and cooperate effectively. We demonstrate the idea in two typical target-oriented multi-agent tasks: cooperative navigation and multi-sensor target coverage.
arXiv Detail & Related papers (2021-10-15T18:29:55Z)
Collective eXplainable AI: Explaining Cooperative Strategies and Agent Contribution in Multiagent Reinforcement Learning with Shapley Values [68.8204255655161]
This study proposes a novel approach to explain cooperative strategies in multiagent RL using Shapley values. Results could have implications for non-discriminatory decision making, ethical and responsible AI-derived decisions or policy making under fairness constraints.
arXiv Detail & Related papers (2021-10-04T10:28:57Z)
Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot [71.28884625011987]
Melting Pot is a MARL evaluation suite that uses reinforcement learning to reduce the human labor required to create novel test scenarios. We have created over 80 unique test scenarios covering a broad range of research topics. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.
arXiv Detail & Related papers (2021-07-14T17:22:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.