MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
Knowledge
- URL: http://arxiv.org/abs/2206.08853v1
- Date: Fri, 17 Jun 2022 15:53:05 GMT
- Title: MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
Knowledge
- Authors: Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang,
Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar
- Abstract summary: We introduce MineDojo, a new framework built on the popular Minecraft game.
We propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function.
Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward.
- Score: 70.47759528596711
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous agents have made great strides in specialist domains like Atari
games and Go. However, they typically learn tabula rasa in isolated
environments with limited and manually conceived objectives, thus failing to
generalize across a wide spectrum of tasks and capabilities. Inspired by how
humans continually learn and adapt in the open world, we advocate a trinity of
ingredients for building generalist agents: 1) an environment that supports a
multitude of tasks and goals, 2) a large-scale database of multimodal
knowledge, and 3) a flexible and scalable agent architecture. We introduce
MineDojo, a new framework built on the popular Minecraft game that features a
simulation suite with thousands of diverse open-ended tasks and an
internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and
forum discussions. Using MineDojo's data, we propose a novel agent learning
algorithm that leverages large pre-trained video-language models as a learned
reward function. Our agent is able to solve a variety of open-ended tasks
specified in free-form language without any manually designed dense shaping
reward. We open-source the simulation suite and knowledge bases
(https://minedojo.org) to promote research towards the goal of generally
capable embodied agents.
Related papers
- OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization [66.22117723598872]
We introduce an open-source framework designed to facilitate the development of multimodal web agent.
We first train the base model with imitation learning to gain the basic abilities.
We then let the agent explore the open web and collect feedback on its trajectories.
arXiv Detail & Related papers (2024-10-25T15:01:27Z) - Odyssey: Empowering Minecraft Agents with Open-World Skills [26.537984734738764]
We introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world.
Odyssey comprises three key parts: (1) An interactive agent with an open-world skill library that consists of 40 primitive skills and 183 compositional skills; (2) A fine-tuned LLaMA-3 model trained on a large question-answering dataset with 390k+ instruction entries derived from the Minecraft Wiki; and (3) A new agent capability benchmark.
arXiv Detail & Related papers (2024-07-22T02:06:59Z) - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment.
Our approach focuses on language-driven generality while imposing minimal assumptions.
Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z) - Creative Agents: Empowering Agents with Imagination for Creative Tasks [31.920963353890393]
We propose a class of solutions for creative agents, where the controller is enhanced with an imaginator that generates detailed imaginations of task outcomes conditioned on language instructions.
We benchmark creative tasks with the challenging open-world game Minecraft, where the agents are asked to create diverse buildings given free-form language instructions.
We perform a detailed experimental analysis of creative agents, showing that creative agents are the first AI agents accomplishing diverse building creation in the survival mode of Minecraft.
arXiv Detail & Related papers (2023-12-05T06:00:52Z) - Agent Lumos: Unified and Modular Training for Open-Source Language Agents [89.78556964988852]
We introduce LUMOS, one of the first frameworks for training open-source LLM-based agents.
LUMOS features a learnable, unified, and modular architecture with a planning module that learns high-level subgoal generation.
We collect large-scale, unified, and high-quality training annotations derived from diverse ground-truth reasoning rationales.
arXiv Detail & Related papers (2023-11-09T00:30:13Z) - Ghost in the Minecraft: Generally Capable Agents for Open-World
Environments via Large Language Models with Text-based Knowledge and Memory [97.87093169454431]
Ghost in the Minecraft (GITM) is a novel framework that integrates Large Language Models (LLMs) with text-based knowledge and memory.
We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute.
The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate.
arXiv Detail & Related papers (2023-05-25T17:59:49Z) - Mastering Diverse Domains through World Models [43.382115013586535]
We present DreamerV3, a general algorithm that outperforms specialized methods across over 150 diverse tasks, with a single configuration.
Dreamer is the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula.
arXiv Detail & Related papers (2023-01-10T18:12:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.