MCU: A Task-centric Framework for Open-ended Agent Evaluation in
Minecraft
- URL: http://arxiv.org/abs/2310.08367v1
- Date: Thu, 12 Oct 2023 14:38:25 GMT
- Title: MCU: A Task-centric Framework for Open-ended Agent Evaluation in
Minecraft
- Authors: Haowei Lin, Zihao Wang, Jianzhu Ma, Yitao Liang
- Abstract summary: This paper introduces a task-centric framework named MCU for Minecraft agent evaluation.
Within the MCU framework, each task is measured with six distinct difficulty scores.
We show that MCU has the high expressivity to cover all tasks used in recent literature on Minecraft agent.
- Score: 28.585449904964033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To pursue the goal of creating an open-ended agent in Minecraft, an
open-ended game environment with unlimited possibilities, this paper introduces
a task-centric framework named MCU for Minecraft agent evaluation. The MCU
framework leverages the concept of atom tasks as fundamental building blocks,
enabling the generation of diverse or even arbitrary tasks. Within the MCU
framework, each task is measured with six distinct difficulty scores (time
consumption, operational effort, planning complexity, intricacy, creativity,
novelty). These scores offer a multi-dimensional assessment of a task from
different angles, and thus can reveal an agent's capability on specific facets.
The difficulty scores also serve as the feature of each task, which creates a
meaningful task space and unveils the relationship between tasks. For efficient
evaluation of Minecraft agents employing the MCU framework, we maintain a
unified benchmark, namely SkillForge, which comprises representative tasks with
diverse categories and difficulty distribution. We also provide convenient
filters for users to select tasks to assess specific capabilities of agents. We
show that MCU has the high expressivity to cover all tasks used in recent
literature on Minecraft agent, and underscores the need for advancements in
areas such as creativity, precise control, and out-of-distribution
generalization under the goal of open-ended Minecraft agent development.
Related papers
- Odyssey: Empowering Minecraft Agents with Open-World Skills [26.537984734738764]
We introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world.
Odyssey comprises three key parts: (1) An interactive agent with an open-world skill library that consists of 40 primitive skills and 183 compositional skills; (2) A fine-tuned LLaMA-3 model trained on a large question-answering dataset with 390k+ instruction entries derived from the Minecraft Wiki; and (3) A new agent capability benchmark.
arXiv Detail & Related papers (2024-07-22T02:06:59Z) - Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification [34.97881486372797]
Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing.
We introduce autonomous embodied verification techniques for agents to fill the gap, laying the groundwork for creative tasks.
Specifically, we propose the Luban agent target creative building tasks in Minecraft, which equips with two-level autonomous embodied verification.
arXiv Detail & Related papers (2024-05-24T10:25:59Z) - Creative Agents: Empowering Agents with Imagination for Creative Tasks [31.920963353890393]
We propose a class of solutions for creative agents, where the controller is enhanced with an imaginator that generates detailed imaginations of task outcomes conditioned on language instructions.
We benchmark creative tasks with the challenging open-world game Minecraft, where the agents are asked to create diverse buildings given free-form language instructions.
We perform a detailed experimental analysis of creative agents, showing that creative agents are the first AI agents accomplishing diverse building creation in the survival mode of Minecraft.
arXiv Detail & Related papers (2023-12-05T06:00:52Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Ghost in the Minecraft: Generally Capable Agents for Open-World
Environments via Large Language Models with Text-based Knowledge and Memory [97.87093169454431]
Ghost in the Minecraft (GITM) is a novel framework that integrates Large Language Models (LLMs) with text-based knowledge and memory.
We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute.
The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate.
arXiv Detail & Related papers (2023-05-25T17:59:49Z) - MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
Knowledge [70.47759528596711]
We introduce MineDojo, a new framework built on the popular Minecraft game.
We propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function.
Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward.
arXiv Detail & Related papers (2022-06-17T15:53:05Z) - Learning to Execute Actions or Ask Clarification Questions [9.784428580459776]
We propose a new builder agent model capable of determining when to ask or execute instructions.
Experimental results show that our model achieves state-of-the-art performance on the collaborative building task.
arXiv Detail & Related papers (2022-04-18T15:36:02Z) - CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and
Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment.
Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z) - LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
Activities [119.88381048477854]
We introduce the LEMMA dataset to provide a single home to address missing dimensions with meticulously designed settings.
We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities.
We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
arXiv Detail & Related papers (2020-07-31T00:13:54Z) - Adaptive Procedural Task Generation for Hard-Exploration Problems [78.20918366839399]
We introduce Adaptive Procedural Task Generation (APT-Gen) to facilitate reinforcement learning in hard-exploration problems.
At the heart of our approach is a task generator that learns to create tasks from a parameterized task space via a black-box procedural generation module.
To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks.
arXiv Detail & Related papers (2020-07-01T09:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.