BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks
- URL: http://arxiv.org/abs/2505.14079v3
- Date: Fri, 30 May 2025 03:04:33 GMT
- Title: BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks
- Authors: Weihong Du, Wenrui Liao, Binyu Yan, Hongru Liang, Anthony G. Cohn, Wenqiang Lei,
- Abstract summary: To complete a task, a large language model (LLM) based agent needs to decompose it into easily executed steps by planning.<n>Existing studies mainly conduct the planning by inferring what steps should be executed next starting from the agent's initial state.<n>We propose to study this issue in Minecraft, a virtual environment that simulates complex tasks based on real-world scenarios.
- Score: 15.48158268901061
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM) based agents have shown great potential in following human instructions and automatically completing various tasks. To complete a task, the agent needs to decompose it into easily executed steps by planning. Existing studies mainly conduct the planning by inferring what steps should be executed next starting from the agent's initial state. However, this forward reasoning paradigm doesn't work well for complex tasks. We propose to study this issue in Minecraft, a virtual environment that simulates complex tasks based on real-world scenarios. We believe that the failure of forward reasoning is caused by the big perception gap between the agent's initial state and task goal. To this end, we leverage backward reasoning and make the planning starting from the terminal state, which can directly achieve the task goal in one step. Specifically, we design a BAckward Reasoning based agent (BAR). It is equipped with a recursive goal decomposition module, a state consistency maintaining module and a stage memory module to make robust, consistent, and efficient planning starting from the terminal state. Experimental results demonstrate the superiority of BAR over existing methods and the effectiveness of proposed modules.
Related papers
- MapAgent: Trajectory-Constructed Memory-Augmented Planning for Mobile Task Automation [5.433829353194621]
MapAgent is a framework that leverages memory constructed from historical trajectories to augment current task planning.<n>We introduce a coarse-to-fine task planning approach that retrieves relevant pages from the memory database based on similarity.<n>Results in real-world scenarios demonstrate that MapAgent achieves superior performance to existing methods.
arXiv Detail & Related papers (2025-07-29T16:05:32Z) - VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots [44.99833362998488]
We propose an architecture for automatically verifying high-level task plans before their execution in simulator or real-world environments.<n>The module uses the reasoning capabilities of the Large Language Models to evaluate logical coherence and identify potential gaps in the plan.<n>We contribute to improving the reliability and efficiency of task planning and addresses the critical need for robust pre-execution verification in autonomous systems.
arXiv Detail & Related papers (2025-07-07T15:31:36Z) - PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC [98.82146219495792]
In this paper, we propose a hierarchical agent framework named PC-Agent.<n>From the perception perspective, we devise an Active Perception Module (APM) to overcome the inadequate abilities of current MLLMs in perceiving screenshot content.<n>From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture.
arXiv Detail & Related papers (2025-02-20T05:41:55Z) - Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following [62.10809033451526]
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs)<n>We frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption.<n>Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption.
arXiv Detail & Related papers (2024-12-27T10:05:45Z) - Planning with Multi-Constraints via Collaborative Language Agents [13.550774629515843]
This paper introduces Planning with Multi-Constraints (PMC), a zero-shot methodology for collaborative multi-agent systems.<n>PMC simplifies complex task planning with constraints by decomposing it into a hierarchy of subordinate tasks.<n>PMC achieved an average 42.68% success rate on TravelPlanner, significantly higher than GPT-4 (2.92%), and outperforming GPT-4 with ReAct on API-Bank by 13.64%.
arXiv Detail & Related papers (2024-05-26T10:33:17Z) - AgentKit: Structured LLM Reasoning with Dynamic Graphs [91.09525140733987]
We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents.
AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts.
arXiv Detail & Related papers (2024-04-17T15:40:45Z) - Faithful Question Answering with Monte-Carlo Planning [78.02429369951363]
We propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps.
We formulate the task as a discrete decision-making problem and solve it through the interaction of a reasoning environment and a controller.
FAME achieves state-of-the-art performance on the standard benchmark.
arXiv Detail & Related papers (2023-05-04T05:21:36Z) - Plan, Eliminate, and Track -- Language Models are Good Teachers for
Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world.
Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks.
PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z) - Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents [26.78244595330595]
"$underlineD$escribe" is an interactive planning approach based on Large Language Models (LLMs)
DEPS facilitates better error correction on initial LLM-generated $textitplan$ by integrating $textitdescription$ of the plan execution process.
Experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks.
arXiv Detail & Related papers (2023-02-03T06:06:27Z) - Modelling Multi-Agent Epistemic Planning in ASP [66.76082318001976]
This paper presents an implementation of a multi-shot Answer Set Programming-based planner that can reason in multi-agent epistemic settings.
The paper shows how the planner, exploiting an ad-hoc epistemic state representation and the efficiency of ASP solvers, has competitive performance results on benchmarks collected from the literature.
arXiv Detail & Related papers (2020-08-07T06:35:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.