Ad-Hoc Human-AI Coordination Challenge
- URL: http://arxiv.org/abs/2506.21490v2
- Date: Sun, 29 Jun 2025 10:25:50 GMT
- Title: Ad-Hoc Human-AI Coordination Challenge
- Authors: Tin Dizdarević, Ravi Hammond, Tobias Gessler, Anisoara Calinescu, Jonathan Cook, Matteo Gallici, Andrei Lupu, Darius Muglich, Johannes Forkel, Jakob Nicolaus Foerster,
- Abstract summary: We introduce the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) to overcome the constraints of costly and difficult-to-reproduce human evaluations.<n>We develop textithuman proxy agents on a large-scale human dataset that serve as robust, cheap, and reproducible human-like evaluation partners.
- Score: 6.933020756939985
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Achieving seamless coordination between AI agents and humans is crucial for real-world applications, yet it remains a significant open challenge. Hanabi is a cooperative card game featuring imperfect information, constrained communication, theory of mind requirements, and coordinated action -- making it an ideal testbed for human-AI coordination. However, its use for human-AI interaction has been limited by the challenges of human evaluation. In this work, we introduce the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) to overcome the constraints of costly and difficult-to-reproduce human evaluations. We develop \textit{human proxy agents} on a large-scale human dataset that serve as robust, cheap, and reproducible human-like evaluation partners in AH2AC2. To encourage the development of data-efficient methods, we open-source a dataset of 3,079 games, deliberately limiting the amount of available human gameplay data. We present baseline results for both two- and three- player Hanabi scenarios. To ensure fair evaluation, we host the proxy agents through a controlled evaluation system rather than releasing them publicly. The code is available at \href{https://github.com/FLAIROx/ah2ac2}{https://github.com/FLAIROx/ah2ac2}.
Related papers
- OpenGuanDan: A Large-Scale Imperfect Information Game Benchmark [31.554414017099102]
OpenGuanDan is a novel benchmark that enables both efficient simulation of GuanDan and comprehensive evaluation of both learning-based and rule-based AI agents.<n>OpenGuanDan poses a suite of nontrivial challenges, including imperfect information, large-scale information set and action spaces, a mixed learning objective involving cooperation and competition, long-horizon decision-making, variable action spaces, and dynamic team composition.<n>We conduct two types of evaluations: (1) pairwise competitions among all GuanDan AI agents, and (2) human-AI matchups.
arXiv Detail & Related papers (2026-01-31T11:46:29Z) - Automatic Curriculum Design for Zero-Shot Human-AI Coordination [4.634917646296438]
Zero-shot human-AI coordination is the training of an ego-agent to coordinate with humans without human data.<n>We propose a utility function and co-player sampling for a zero-shot human-AI coordination setting.<n>Our method achieves high performance in human-AI coordination tasks in unseen environments.
arXiv Detail & Related papers (2025-03-10T12:55:31Z) - Problem Solving Through Human-AI Preference-Based Cooperation [74.39233146428492]
We propose HAICo2, a novel human-AI co-construction framework.<n>We take first steps towards a formalization of HAICo2 and discuss the difficult open research problems that it faces.
arXiv Detail & Related papers (2024-08-14T11:06:57Z) - Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination [36.33334853998621]
We introduce the Cooperative Open-ended LEarning (COLE) framework to solve cooperative incompatibility in learning.
COLE formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy.
We show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis.
arXiv Detail & Related papers (2023-06-05T16:51:38Z) - Language Instructed Reinforcement Learning for Human-AI Coordination [23.694362407434753]
We propose a novel framework, instructRL, that enables humans to specify what kind of strategies they expect from their AI partners through natural language instructions.
We show that instructRL converges to human-like policies that satisfy the given instructions in a proof-of-concept environment and the challenging Hanabi benchmark.
arXiv Detail & Related papers (2023-04-13T04:47:31Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Trustworthy Human Computation: A Survey [21.434956224643294]
Human computation requires close engagement with both "human populations as users" and "human populations as driving forces"
This survey lays the groundwork for the realization of trustworthy human computation.
arXiv Detail & Related papers (2022-10-22T01:30:50Z) - Human-AI Coordination via Human-Regularized Search and Learning [33.95649252941375]
We develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark.
We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels.
We show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents.
arXiv Detail & Related papers (2022-10-11T03:46:12Z) - Human Decision Makings on Curriculum Reinforcement Learning with
Difficulty Adjustment [52.07473934146584]
We guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process.
Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications.
It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level.
arXiv Detail & Related papers (2022-08-04T23:53:51Z) - Reinforcement Learning on Human Decision Models for Uniquely
Collaborative AI Teammates [0.0]
This study details the development of the agent that won the challenge by achieving a human-play average score of 16.5.
The winning agent's development consisted of observing and accurately modeling the author's decision making in Hanabi, then training with a behavioral clone of the author.
Notably, the agent discovered a human-complementary play style by first mimicking human decision making, then exploring variations to the human-like strategy that led to higher simulated human-bot scores.
arXiv Detail & Related papers (2021-11-18T17:06:57Z) - Collaborating with Humans without Human Data [6.158826414652401]
We study the problem of how to train agents that collaborate well with human partners without using human data.
We train our agent partner as the best response to a population of self-play agents and their past checkpoints.
We find that Fictitious Co-Play (FCP) agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners.
arXiv Detail & Related papers (2021-10-15T16:03:57Z) - A User-Centred Framework for Explainable Artificial Intelligence in
Human-Robot Interaction [70.11080854486953]
We propose a user-centred framework for XAI that focuses on its social-interactive aspect.
The framework aims to provide a structure for interactive XAI solutions thought for non-expert users.
arXiv Detail & Related papers (2021-09-27T09:56:23Z) - Joint Mind Modeling for Explanation Generation in Complex Human-Robot
Collaborative Tasks [83.37025218216888]
We propose a novel explainable AI (XAI) framework for achieving human-like communication in human-robot collaborations.
The robot builds a hierarchical mind model of the human user and generates explanations of its own mind as a form of communications.
Results show that the generated explanations of our approach significantly improves the collaboration performance and user perception of the robot.
arXiv Detail & Related papers (2020-07-24T23:35:03Z) - Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork [54.309495231017344]
We argue that AI systems should be trained in a human-centered manner, directly optimized for team performance.
We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves.
Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance.
arXiv Detail & Related papers (2020-04-27T19:06:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.