OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied
Instruction Following
- URL: http://arxiv.org/abs/2403.03017v1
- Date: Tue, 5 Mar 2024 14:53:53 GMT
- Title: OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied
Instruction Following
- Authors: Haochen Shi, Zhiyuan Sun, Xingdi Yuan, Marc-Alexandre C\^ot\'e, Bang
Liu
- Abstract summary: Embodied Instruction Following (EIF) is a crucial task in embodied learning, requiring agents to interact with their environment through egocentric observations to fulfill natural language instructions.
Recent advancements have seen a surge in employing large language models (LLMs) within a framework-centric approach to enhance performance in EIF.
We introduce OPEx, a comprehensive framework that delineates the core components essential for solving EIF tasks: Observer, Planner, and Executor.
- Score: 38.99303334457817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied Instruction Following (EIF) is a crucial task in embodied learning,
requiring agents to interact with their environment through egocentric
observations to fulfill natural language instructions. Recent advancements have
seen a surge in employing large language models (LLMs) within a
framework-centric approach to enhance performance in embodied learning tasks,
including EIF. Despite these efforts, there exists a lack of a unified
understanding regarding the impact of various components-ranging from visual
perception to action execution-on task performance. To address this gap, we
introduce OPEx, a comprehensive framework that delineates the core components
essential for solving embodied learning tasks: Observer, Planner, and Executor.
Through extensive evaluations, we provide a deep analysis of how each component
influences EIF task performance. Furthermore, we innovate within this space by
deploying a multi-agent dialogue strategy on a TextWorld counterpart, further
enhancing task performance. Our findings reveal that LLM-centric design
markedly improves EIF outcomes, identify visual perception and low-level action
execution as critical bottlenecks, and demonstrate that augmenting LLMs with a
multi-agent framework further elevates performance.
Related papers
- VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use [74.39058448757645]
We present VipAct, an agent framework that enhances vision-language models (VLMs)
VipAct consists of an orchestrator agent, which manages task requirement analysis, planning, and coordination, along with specialized agents that handle specific tasks.
We evaluate VipAct on benchmarks featuring a diverse set of visual perception tasks, with experimental results demonstrating significant performance improvements.
arXiv Detail & Related papers (2024-10-21T18:10:26Z) - REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability [23.81322529587759]
REVEAL-IT is a novel framework for explaining the learning process of an agent in complex environments.
We visualize the policy structure and the agent's learning process for various training tasks.
A GNN-based explainer learns to highlight the most important section of the policy, providing a more clear and robust explanation of the agent's learning process.
arXiv Detail & Related papers (2024-06-20T11:29:26Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [79.38140606606126]
We propose an algorithmic framework that fine-tunes vision-language models (VLMs) with reinforcement learning (RL)
Our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning.
We demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks.
arXiv Detail & Related papers (2024-05-16T17:50:19Z) - Enhancing Large Language Model with Decomposed Reasoning for Emotion
Cause Pair Extraction [13.245873138716044]
Emotion-Cause Pair Extraction (ECPE) involves extracting clause pairs representing emotions and their causes in a document.
Inspired by recent work, we explore leveraging large language model (LLM) to address ECPE task without additional training.
We introduce chain-of-thought to mimic human cognitive process and propose the Decomposed Emotion-Cause Chain (DECC) framework.
arXiv Detail & Related papers (2024-01-31T10:20:01Z) - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [76.95062553043607]
evaluating large language models (LLMs) is essential for understanding their capabilities and facilitating their integration into practical applications.
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.
arXiv Detail & Related papers (2024-01-24T01:51:00Z) - Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration [83.4031923134958]
Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents.
Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes.
We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
arXiv Detail & Related papers (2023-09-30T07:11:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.