APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents
- URL: http://arxiv.org/abs/2411.17255v1
- Date: Tue, 26 Nov 2024 09:31:28 GMT
- Title: APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents
- Authors: Jun Yu Chen, Tao Gao,
- Abstract summary: We present an advanced Large Language Model (LLM)-driven framework that enables autonomous agents to construct complex structures in Minecraft.
By employing chain-of-thought decomposition along with multimodal inputs, the framework generates detailed architectural layouts and blueprints.
Our agent incorporates both memory and reflection modules to facilitate lifelong learning, adaptive refinement, and error correction throughout the building process.
- Score: 8.479128275067742
- License:
- Abstract: We present APT, an advanced Large Language Model (LLM)-driven framework that enables autonomous agents to construct complex and creative structures within the Minecraft environment. Unlike previous approaches that primarily concentrate on skill-based open-world tasks or rely on image-based diffusion models for generating voxel-based structures, our method leverages the intrinsic spatial reasoning capabilities of LLMs. By employing chain-of-thought decomposition along with multimodal inputs, the framework generates detailed architectural layouts and blueprints that the agent can execute under zero-shot or few-shot learning scenarios. Our agent incorporates both memory and reflection modules to facilitate lifelong learning, adaptive refinement, and error correction throughout the building process. To rigorously evaluate the agent's performance in this emerging research area, we introduce a comprehensive benchmark consisting of diverse construction tasks designed to test creativity, spatial reasoning, adherence to in-game rules, and the effective integration of multimodal instructions. Experimental results using various GPT-based LLM backends and agent configurations demonstrate the agent's capacity to accurately interpret extensive instructions involving numerous items, their positions, and orientations. The agent successfully produces complex structures complete with internal functionalities such as Redstone-powered systems. A/B testing indicates that the inclusion of a memory module leads to a significant increase in performance, emphasizing its role in enabling continuous learning and the reuse of accumulated experience. Additionally, the agent's unexpected emergence of scaffolding behavior highlights the potential of future LLM-driven agents to utilize subroutine planning and leverage the emergence ability of LLMs to autonomously develop human-like problem-solving techniques.
Related papers
- AgentSquare: Automatic LLM Agent Search in Modular Design Space [16.659969168343082]
Large Language Models (LLMs) have led to a rapid growth of agentic systems capable of handling a wide range of complex tasks.
We introduce a new research problem: Modularized LLM Agent Search (MoLAS)
arXiv Detail & Related papers (2024-10-08T15:52:42Z) - LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents [0.0]
We propose a novel LLM-based Agent Unified Modeling Framework (LLM-Agent-UMF)
Our framework distinguishes between the different components of an LLM-based agent, setting LLMs and tools apart from a new element, the core-agent.
We evaluate our framework by applying it to thirteen state-of-the-art agents, thereby demonstrating its alignment with their functionalities.
arXiv Detail & Related papers (2024-09-17T17:54:17Z) - Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks.
We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models.
We present four brick-oriented operations: retrieval and routing, merging, updating, and growing.
We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z) - Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making.
Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations.
Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - Solution-oriented Agent-based Models Generation with Verifier-assisted
Iterative In-context Learning [10.67134969207797]
Agent-based models (ABMs) stand as an essential paradigm for proposing and validating hypothetical solutions or policies.
Large language models (LLMs) encapsulating cross-domain knowledge and programming proficiency could potentially alleviate the difficulty of this process.
We present SAGE, a general solution-oriented ABM generation framework designed for automatic modeling and generating solutions for targeted problems.
arXiv Detail & Related papers (2024-02-04T07:59:06Z) - Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs.
We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer.
This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - TPTU: Large Language Model-based AI Agents for Task Planning and Tool
Usage [28.554981886052953]
Large Language Models (LLMs) have emerged as powerful tools for various real-world applications.
Despite their prowess, intrinsic generative abilities of LLMs may prove insufficient for handling complex tasks.
This paper proposes a structured framework tailored for LLM-based AI Agents.
arXiv Detail & Related papers (2023-08-07T09:22:03Z) - A Consciousness-Inspired Planning Agent for Model-Based Reinforcement
Learning [104.3643447579578]
We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state.
The design allows agents to learn to plan effectively, by attending to the relevant objects, leading to better out-of-distribution generalization.
arXiv Detail & Related papers (2021-06-03T19:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.