Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
- URL: http://arxiv.org/abs/2404.00246v1
- Date: Sat, 30 Mar 2024 04:48:38 GMT
- Title: Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
- Authors: Guande Wu, Chen Zhao, Claudio Silva, He He,
- Abstract summary: We design a blocks-world environment where two agents, each having unique goals and skills, build a target structure together.
To complete the goals, they can act in the world and communicate in natural language.
We adopt chain-of-thought prompts that include intermediate reasoning steps to model the partner's state and identify and correct execution errors.
- Score: 13.005764902339523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language agents that interact with the world on their own have great potential for automating digital tasks. While large language model (LLM) agents have made progress in understanding and executing tasks such as textual games and webpage control, many real-world tasks also require collaboration with humans or other LLMs in equal roles, which involves intent understanding, task coordination, and communication. To test LLM's ability to collaborate, we design a blocks-world environment, where two agents, each having unique goals and skills, build a target structure together. To complete the goals, they can act in the world and communicate in natural language. Under this environment, we design increasingly challenging settings to evaluate different collaboration perspectives, from independent to more complex, dependent tasks. We further adopt chain-of-thought prompts that include intermediate reasoning steps to model the partner's state and identify and correct execution errors. Both human-machine and machine-machine experiments show that LLM agents have strong grounding capacities, and our approach significantly improves the evaluation metric.
Related papers
- CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models [19.73329768987112]
CurricuLLM is a curriculum learning tool for complex robot control tasks.
It generates subtasks that aid target task learning in natural language form.
It also translates natural language description of subtasks into executable code.
CurricuLLM can aid learning complex robot control tasks.
arXiv Detail & Related papers (2024-09-27T01:48:16Z) - WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence.
WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z) - Automatic Robotic Development through Collaborative Framework by Large
Language Models [13.957351735394683]
We propose an innovative automated collaboration framework inspired by real-world robot developers.
This framework employs multiple LLMs in distinct roles analysts, programmers, and testers.
Analysts delve deep into user requirements, enabling programmers to produce precise code, while testers fine-tune the parameters.
arXiv Detail & Related papers (2024-02-06T04:40:27Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation [52.930183136111864]
We propose using scorable negotiation to evaluate Large Language Models (LLMs)
To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities.
We provide procedures to create new games and increase games' difficulty to have an evolving benchmark.
arXiv Detail & Related papers (2023-09-29T13:33:06Z) - Building Cooperative Embodied Agents Modularly with Large Language
Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments.
We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework.
Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z) - Self-collaboration Code Generation via ChatGPT [35.88318116340547]
Large Language Models (LLMs) have demonstrated remarkable code-generation ability, but struggle with complex tasks.
We present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT.
To effectively organize and manage this virtual team, we incorporate software-development methodology into the framework.
arXiv Detail & Related papers (2023-04-15T16:33:32Z) - Inner Monologue: Embodied Reasoning through Planning with Language
Models [81.07216635735571]
Large Language Models (LLMs) can be applied to domains beyond natural language processing.
LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them.
We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios.
arXiv Detail & Related papers (2022-07-12T15:20:48Z) - LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
Activities [119.88381048477854]
We introduce the LEMMA dataset to provide a single home to address missing dimensions with meticulously designed settings.
We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities.
We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
arXiv Detail & Related papers (2020-07-31T00:13:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.