Odyssey: Empowering Minecraft Agents with Open-World Skills
- URL: http://arxiv.org/abs/2407.15325v2
- Date: Mon, 7 Oct 2024 09:40:07 GMT
- Title: Odyssey: Empowering Minecraft Agents with Open-World Skills
- Authors: Shunyu Liu, Yaoru Li, Kongcheng Zhang, Zhenyu Cui, Wenkai Fang, Yuxuan Zheng, Tongya Zheng, Mingli Song,
- Abstract summary: We introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world.
Odyssey comprises three key parts: (1) An interactive agent with an open-world skill library that consists of 40 primitive skills and 183 compositional skills; (2) A fine-tuned LLaMA-3 model trained on a large question-answering dataset with 390k+ instruction entries derived from the Minecraft Wiki; and (3) A new agent capability benchmark.
- Score: 26.537984734738764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have delved into constructing generalist agents for open-world environments like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic programmatic tasks, e.g., material collection and tool-crafting following the Minecraft tech-tree, treating the ObtainDiamond task as the ultimate goal. This limitation stems from the narrowly defined set of actions available to agents, requiring them to learn effective long-horizon strategies from scratch. Consequently, discovering diverse gameplay opportunities in the open world becomes challenging. In this work, we introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world. Odyssey comprises three key parts: (1) An interactive agent with an open-world skill library that consists of 40 primitive skills and 183 compositional skills. (2) A fine-tuned LLaMA-3 model trained on a large question-answering dataset with 390k+ instruction entries derived from the Minecraft Wiki. (3) A new agent capability benchmark includes the long-term planning task, the dynamic-immediate planning task, and the autonomous exploration task. Extensive experiments demonstrate that the proposed Odyssey framework can effectively evaluate different capabilities of LLM-based agents. All datasets, model weights, and code are publicly available to motivate future research on more advanced autonomous agent solutions.
Related papers
- AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification [34.97881486372797]
Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing.
We introduce autonomous embodied verification techniques for agents to fill the gap, laying the groundwork for creative tasks.
Specifically, we propose the Luban agent target creative building tasks in Minecraft, which equips with two-level autonomous embodied verification.
arXiv Detail & Related papers (2024-05-24T10:25:59Z) - Agent Planning with World Knowledge Model [88.4897773735576]
We introduce parametric World Knowledge Model (WKM) to facilitate agent planning.
We develop WKM, providing prior task knowledge to guide the global planning and dynamic state knowledge to assist the local planning.
Our method can achieve superior performance compared to various strong baselines.
arXiv Detail & Related papers (2024-05-23T06:03:19Z) - See and Think: Embodied Agent in Virtual Environment [12.801720916220823]
Large language models (LLMs) have achieved impressive pro-gress on several open-world tasks.
This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment.
arXiv Detail & Related papers (2023-11-26T06:38:16Z) - Agent Lumos: Unified and Modular Training for Open-Source Language Agents [89.78556964988852]
We introduce LUMOS, one of the first frameworks for training open-source LLM-based agents.
LUMOS features a learnable, unified, and modular architecture with a planning module that learns high-level subgoal generation.
We collect large-scale, unified, and high-quality training annotations derived from diverse ground-truth reasoning rationales.
arXiv Detail & Related papers (2023-11-09T00:30:13Z) - MCU: A Task-centric Framework for Open-ended Agent Evaluation in
Minecraft [28.585449904964033]
This paper introduces a task-centric framework named MCU for Minecraft agent evaluation.
Within the MCU framework, each task is measured with six distinct difficulty scores.
We show that MCU has the high expressivity to cover all tasks used in recent literature on Minecraft agent.
arXiv Detail & Related papers (2023-10-12T14:38:25Z) - Ghost in the Minecraft: Generally Capable Agents for Open-World
Environments via Large Language Models with Text-based Knowledge and Memory [97.87093169454431]
Ghost in the Minecraft (GITM) is a novel framework that integrates Large Language Models (LLMs) with text-based knowledge and memory.
We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute.
The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate.
arXiv Detail & Related papers (2023-05-25T17:59:49Z) - OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks.
This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents.
We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z) - MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
Knowledge [70.47759528596711]
We introduce MineDojo, a new framework built on the popular Minecraft game.
We propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function.
Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward.
arXiv Detail & Related papers (2022-06-17T15:53:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.