MCU: A Task-centric Framework for Open-ended Agent Evaluation in
Minecraft
- URL: http://arxiv.org/abs/2310.08367v1
- Date: Thu, 12 Oct 2023 14:38:25 GMT
- Title: MCU: A Task-centric Framework for Open-ended Agent Evaluation in
Minecraft
- Authors: Haowei Lin, Zihao Wang, Jianzhu Ma, Yitao Liang
- Abstract summary: This paper introduces a task-centric framework named MCU for Minecraft agent evaluation.
Within the MCU framework, each task is measured with six distinct difficulty scores.
We show that MCU has the high expressivity to cover all tasks used in recent literature on Minecraft agent.
- Score: 28.585449904964033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To pursue the goal of creating an open-ended agent in Minecraft, an
open-ended game environment with unlimited possibilities, this paper introduces
a task-centric framework named MCU for Minecraft agent evaluation. The MCU
framework leverages the concept of atom tasks as fundamental building blocks,
enabling the generation of diverse or even arbitrary tasks. Within the MCU
framework, each task is measured with six distinct difficulty scores (time
consumption, operational effort, planning complexity, intricacy, creativity,
novelty). These scores offer a multi-dimensional assessment of a task from
different angles, and thus can reveal an agent's capability on specific facets.
The difficulty scores also serve as the feature of each task, which creates a
meaningful task space and unveils the relationship between tasks. For efficient
evaluation of Minecraft agents employing the MCU framework, we maintain a
unified benchmark, namely SkillForge, which comprises representative tasks with
diverse categories and difficulty distribution. We also provide convenient
filters for users to select tasks to assess specific capabilities of agents. We
show that MCU has the high expressivity to cover all tasks used in recent
literature on Minecraft agent, and underscores the need for advancements in
areas such as creativity, precise control, and out-of-distribution
generalization under the goal of open-ended Minecraft agent development.
Related papers
- Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration [51.452664740963066]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments.
We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions.
Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv Detail & Related papers (2024-12-20T09:21:15Z) - Complexity Experts are Task-Discriminative Learners for Any Image Restoration [80.46313715427928]
We introduce complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields.
This preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity.
The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability.
arXiv Detail & Related papers (2024-11-27T15:58:07Z) - Odyssey: Empowering Minecraft Agents with Open-World Skills [26.537984734738764]
We introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world.
Odyssey comprises three key parts: (1) An interactive agent with an open-world skill library that consists of 40 primitive skills and 183 compositional skills; (2) A fine-tuned LLaMA-3 model trained on a large question-answering dataset with 390k+ instruction entries derived from the Minecraft Wiki; and (3) A new agent capability benchmark.
arXiv Detail & Related papers (2024-07-22T02:06:59Z) - MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups.
It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics.
With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z) - DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery.
It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations.
We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z) - Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification [34.97881486372797]
Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing.
We introduce autonomous embodied verification techniques for agents to fill the gap, laying the groundwork for creative tasks.
Specifically, we propose the Luban agent target creative building tasks in Minecraft, which equips with two-level autonomous embodied verification.
arXiv Detail & Related papers (2024-05-24T10:25:59Z) - Creative Agents: Empowering Agents with Imagination for Creative Tasks [31.920963353890393]
We propose a class of solutions for creative agents, where the controller is enhanced with an imaginator that generates detailed imaginations of task outcomes conditioned on language instructions.
We benchmark creative tasks with the challenging open-world game Minecraft, where the agents are asked to create diverse buildings given free-form language instructions.
We perform a detailed experimental analysis of creative agents, showing that creative agents are the first AI agents accomplishing diverse building creation in the survival mode of Minecraft.
arXiv Detail & Related papers (2023-12-05T06:00:52Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Ghost in the Minecraft: Generally Capable Agents for Open-World
Environments via Large Language Models with Text-based Knowledge and Memory [97.87093169454431]
Ghost in the Minecraft (GITM) is a novel framework that integrates Large Language Models (LLMs) with text-based knowledge and memory.
We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute.
The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate.
arXiv Detail & Related papers (2023-05-25T17:59:49Z) - OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks.
This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents.
We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z) - MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
Knowledge [70.47759528596711]
We introduce MineDojo, a new framework built on the popular Minecraft game.
We propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function.
Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward.
arXiv Detail & Related papers (2022-06-17T15:53:05Z) - Learning to Execute Actions or Ask Clarification Questions [9.784428580459776]
We propose a new builder agent model capable of determining when to ask or execute instructions.
Experimental results show that our model achieves state-of-the-art performance on the collaborative building task.
arXiv Detail & Related papers (2022-04-18T15:36:02Z) - Benchmarking the Spectrum of Agent Capabilities [7.088856621650764]
We introduce Crafter, an open world survival game with visual inputs that evaluates a wide range of general abilities within a single environment.
Agents learn from the provided reward signal or through intrinsic objectives and are evaluated by semantically meaningful achievements.
We experimentally verify that Crafter is of appropriate difficulty to drive future research and provide baselines scores of reward agents and unsupervised agents.
arXiv Detail & Related papers (2021-09-14T15:49:31Z) - Open-Ended Learning Leads to Generally Capable Agents [12.079718607356178]
We define a universe of tasks within an environment domain and demonstrate the ability to train agents that are capable across this vast space and beyond.
The resulting space is exceptionally diverse in terms of the challenges posed to agents, and as such, even measuring the learning progress of an agent is an open research problem.
We show that through constructing an open-ended learning process, which dynamically changes the training task distributions and training objectives such that the agent never stops learning, we achieve consistent learning of new behaviours.
arXiv Detail & Related papers (2021-07-27T13:30:07Z) - CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and
Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment.
Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z) - LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
Activities [119.88381048477854]
We introduce the LEMMA dataset to provide a single home to address missing dimensions with meticulously designed settings.
We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities.
We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
arXiv Detail & Related papers (2020-07-31T00:13:54Z) - Adaptive Procedural Task Generation for Hard-Exploration Problems [78.20918366839399]
We introduce Adaptive Procedural Task Generation (APT-Gen) to facilitate reinforcement learning in hard-exploration problems.
At the heart of our approach is a task generator that learns to create tasks from a parameterized task space via a black-box procedural generation module.
To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks.
arXiv Detail & Related papers (2020-07-01T09:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.