Related papers: The Overcooked Generalisation Challenge

Related papers

ClarEval: A Benchmark for Evaluating Clarification Skills of Code Agents under Ambiguous Instructions [19.875754116636436]
We introduce ClarEval, a framework designed to assess an agent's "Collaborative Quotient" by simulating the inherent ambiguity of human communication.<n>To quantify this capability, we propose a metric suite led by Average Turns to Clarify coders (ATC) and Key Question Coverage (KQC)<n>Our experiments on eleven state-of-the-art agents reveal a stark reality: while models like GPT-5-Coder excel at coding, they often lack the strategic communication skills required for efficient partnership.
arXiv Detail & Related papers (2026-02-27T01:10:27Z)
From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering [7.402388519535592]
Current benchmarks, focused on code correctness, fail to capture the nuanced, interactive behaviors essential for successful human-AI partnership.<n>We present a foundational taxonomy of desirable agent behaviors for enterprise software engineering.<n>We also introduce the Context-Adaptive Behavior (CAB) Framework.
arXiv Detail & Related papers (2025-12-29T20:18:57Z)
Completion $\ eq$ Collaboration: Scaling Collaborative Effort with Agents [48.95020665909723]
We argue for a shift from building and assessing task completion agents to developing collaborative agents.<n>We introduce collaborative effort scaling, a framework that captures how an agent's utility grows with increasing user involvement.
arXiv Detail & Related papers (2025-10-29T17:47:18Z)
Learning "Partner-Aware" Collaborators in Multi-Party Collaboration [12.287537011305497]
Large Language Models (LLMs) are increasingly bring deployed in agentic settings where they act as collaborators with humans.<n>This paper builds on the AI alignment and safe interruptability literature to offer novel theoretical insights on collaborative behavior.<n>We propose Interruptible Collaborative Roleplayer (ICR)-a novel partner-aware learning algorithm to train CG-optimal collaborators.
arXiv Detail & Related papers (2025-10-26T00:05:48Z)
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning [81.08755597239262]
Existing approaches suffer from a trade-off: generalist agents excel at planning but perform poorly in execution, while specialized agents demonstrate the opposite weakness.<n>Recent compositional frameworks attempt to bridge this gap by combining a planner and an actor, but they are typically static and non-trainable.<n>We introduce CODA, a novel and trainable compositional framework that integrates a generalist planner with a specialist executor.
arXiv Detail & Related papers (2025-08-27T17:59:50Z)
ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork [35.31433715096886]
Developing AI agents capable of collaborating with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT)<n>We present a unified framework for AHT by reformulating the problem as an open-ended learning process between an ad hoc agent and an adversarial teammate generator.<n>We introduce ROTATE, a regret-driven, open-ended training algorithm that alternates between improving the AHT agent and generating teammates that probe its deficiencies.
arXiv Detail & Related papers (2025-05-29T17:24:54Z)
Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination [37.90912492084769]
We study how reinforcement learning on a distribution of environments with a single partner enables learning general cooperative skills. We introduce two Jax-based, procedural generators that create billions of solvable coordination challenges. Our findings suggest that learning to collaborate across many unique scenarios encourages agents to develop general norms.
arXiv Detail & Related papers (2025-04-17T07:41:25Z)
An Empirical Game-Theoretic Analysis of Autonomous Cyber-Defence Agents [0.0]
We introduce and evaluate a theoretically-sound, potential-based reward shaping approach to expedite this process. In addition, given the increasing number of open-source ACD-DRL approaches, we extend the DO formulation to allow for multiple response oracles.
arXiv Detail & Related papers (2025-01-31T15:15:02Z)
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration [51.452664740963066]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments. We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions. Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv Detail & Related papers (2024-12-20T09:21:15Z)
Problem Solving Through Human-AI Preference-Based Cooperation [74.39233146428492]
We propose HAI-Co2, a novel human-AI co-construction framework. We formalize HAI-Co2 and discuss the difficult open research problems that it faces. We present a case study of HAI-Co2 and demonstrate its efficacy compared to monolithic generative AI models.
arXiv Detail & Related papers (2024-08-14T11:06:57Z)
Aligning Individual and Collective Objectives in Multi-Agent Cooperation [18.082268221987956]
Mixed-motive cooperation is one of the most prominent challenges in multi-agent learning. We introduce a novel optimization method named textbftextitAltruistic textbftextitGradient textbftextitAdjustment (textbftextitAgA) that employs gradient adjustments to progressively align individual and collective objectives. We evaluate the effectiveness of our algorithm AgA through benchmark environments for testing mixed-motive collaboration with small-scale agents.
arXiv Detail & Related papers (2024-02-19T08:18:53Z)
CCA: Collaborative Competitive Agents for Image Editing [59.54347952062684]
This paper presents a novel generative model, Collaborative Competitive Agents (CCA) It leverages the capabilities of multiple Large Language Models (LLMs) based agents to execute complex tasks. The paper's main contributions include the introduction of a multi-agent-based generative model with controllable intermediate steps and iterative optimization.
arXiv Detail & Related papers (2024-01-23T11:46:28Z)
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO [50.58083807719749]
We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems. We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.
arXiv Detail & Related papers (2023-08-30T07:16:11Z)
Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination [36.33334853998621]
We introduce the Cooperative Open-ended LEarning (COLE) framework to solve cooperative incompatibility in learning. COLE formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy. We show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis.
arXiv Detail & Related papers (2023-06-05T16:51:38Z)
PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population. We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives. In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z)
RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios. RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents. Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z)
Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination [0.4153433779716327]
We formalize an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play. We show that existing state-of-the-art cooperative AI algorithms, such as Other-Play and Off-Belief Learning, under-perform in this paradigm. We propose the Any-Play learning augmentation for generalizing self-play-based algorithms to the inter-algorithm cross-play setting.
arXiv Detail & Related papers (2022-01-28T21:43:58Z)
Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation [87.85646257351212]
We study the problem of multi-robot mapless navigation in the popular Training and Decentralized Execution (CTDE) paradigm. This problem is challenging when each robot considers its path without explicitly sharing observations with other robots. We propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value.
arXiv Detail & Related papers (2021-12-16T16:47:00Z)
Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams [14.215359943041369]
We propose and analyze a decentralized Multi-Armed Bandit (MAB) problem with coupled rewards as an abstraction of more general multi-agent collaboration. We propose a Partner-Aware strategy for joint sequential decision-making that extends the well-known single-agent Upper Confidence Bound algorithm. Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.
arXiv Detail & Related papers (2021-10-02T08:17:30Z)
On the Critical Role of Conventions in Adaptive Human-AI Collaboration [73.21967490610142]
We propose a learning framework that teases apart rule-dependent representation from convention-dependent representation. We experimentally validate our approach on three collaborative tasks varying in complexity.
arXiv Detail & Related papers (2021-04-07T02:46:19Z)
Group Collaborative Learning for Co-Salient Object Detection [152.67721740487937]
We present a novel group collaborative learning framework (GCoNet) capable of detecting co-salient objects in real time (16ms) Extensive experiments on three challenging benchmarks, i.e., CoCA, CoSOD3k, and Cosal2015, demonstrate that our simple GCoNet outperforms 10 cutting-edge models and achieves the new state-of-the-art.
arXiv Detail & Related papers (2021-03-15T13:16:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.