Related papers: ADAM: An Embodied Causal Agent in Open-World Environments

ADAM: An Embodied Causal Agent in Open-World Environments

URL: http://arxiv.org/abs/2410.22194v1
Date: Tue, 29 Oct 2024 16:32:01 GMT
Title: ADAM: An Embodied Causal Agent in Open-World Environments
Authors: Shu Yu, Chaochao Lu,
Abstract summary: We introduce ADAM, an emboDied causal agent in Minecraft. ADAM can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through lifelong learning.
Score: 3.2474668680608314
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In open-world environments like Minecraft, existing agents face challenges in continuously learning structured knowledge, particularly causality. These challenges stem from the opacity inherent in black-box models and an excessive reliance on prior knowledge during training, which impair their interpretability and generalization capability. To this end, we introduce ADAM, An emboDied causal Agent in Minecraft, that can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through lifelong learning. ADAM is empowered by four key components: 1) an interaction module, enabling the agent to execute actions while documenting the interaction processes; 2) a causal model module, tasked with constructing an ever-growing causal graph from scratch, which enhances interpretability and diminishes reliance on prior knowledge; 3) a controller module, comprising a planner, an actor, and a memory pool, which uses the learned causal graph to accomplish tasks; 4) a perception module, powered by multimodal large language models, which enables ADAM to perceive like a human player. Extensive experiments show that ADAM constructs an almost perfect causal graph from scratch, enabling efficient task decomposition and execution with strong interpretability. Notably, in our modified Minecraft games where no prior knowledge is available, ADAM maintains its performance and shows remarkable robustness and generalization capability. ADAM pioneers a novel paradigm that integrates causal methods and embodied agents in a synergistic manner. Our project page is at https://opencausalab.github.io/ADAM.

Related papers

Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts [54.21319853862452]
We present Optimus-3, a general-purpose agent for Minecraft.<n>We propose a knowledge-enhanced data generation pipeline to provide scalable and high-quality training data for agent development.<n>We develop a Multimodal Reasoning-Augmented Reinforcement Learning approach to enhance the agent's reasoning ability for visual diversity.
arXiv Detail & Related papers (2025-06-12T05:29:40Z)
APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents [8.479128275067742]
We present an advanced Large Language Model (LLM)-driven framework that enables autonomous agents to construct complex structures in Minecraft. By employing chain-of-thought decomposition along with multimodal inputs, the framework generates detailed architectural layouts and blueprints. Our agent incorporates both memory and reflection modules to facilitate lifelong learning, adaptive refinement, and error correction throughout the building process.
arXiv Detail & Related papers (2024-11-26T09:31:28Z)
Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence [2.890656584329591]
Online Decision MetaMorphFormer (ODM) aims to achieve self-awareness, environment recognition, and action planning. ODM can be applied to any arbitrary agent with a multi-joint body, located in different environments, and trained with different types of tasks using large-scale pre-trained datasets.
arXiv Detail & Related papers (2024-09-11T15:22:43Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI [2.380943129168748]
Embodied AI is one of the most popular studies in artificial intelligence and robotics. Scene knowledge is important for an agent to understand the surroundings and make correct decisions. Scene-MMKG construction method combines conventional knowledge engineering and large language models.
arXiv Detail & Related papers (2023-11-07T08:06:27Z)
Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents. We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations. We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z)
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory [97.87093169454431]
Ghost in the Minecraft (GITM) is a novel framework that integrates Large Language Models (LLMs) with text-based knowledge and memory. We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute. The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate.
arXiv Detail & Related papers (2023-05-25T17:59:49Z)
ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains. The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK) We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z)
OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents. We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z)
WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data. We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.