Open-Ended Instructable Embodied Agents with Memory-Augmented Large
Language Models
- URL: http://arxiv.org/abs/2310.15127v2
- Date: Mon, 20 Nov 2023 18:51:29 GMT
- Title: Open-Ended Instructable Embodied Agents with Memory-Augmented Large
Language Models
- Authors: Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki
- Abstract summary: We introduce HELPER, an embodied agent equipped with an external memory of language-program pairs.
relevant memories are retrieved based on the current dialogue, instruction, correction, or VLM description.
HELPER sets a new state-of-the-art in the TEACh benchmark in both Execution from Dialog History (EDH) and Trajectory from Dialogue (TfD)
- Score: 19.594361652336996
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained and frozen large language models (LLMs) can effectively map
simple scene rearrangement instructions to programs over a robot's visuomotor
functions through appropriate few-shot example prompting. To parse open-domain
natural language and adapt to a user's idiosyncratic procedures, not known
during prompt engineering time, fixed prompts fall short. In this paper, we
introduce HELPER, an embodied agent equipped with an external memory of
language-program pairs that parses free-form human-robot dialogue into action
programs through retrieval-augmented LLM prompting: relevant memories are
retrieved based on the current dialogue, instruction, correction, or VLM
description, and used as in-context prompt examples for LLM querying. The
memory is expanded during deployment to include pairs of user's language and
action plans, to assist future inferences and personalize them to the user's
language and routines. HELPER sets a new state-of-the-art in the TEACh
benchmark in both Execution from Dialog History (EDH) and Trajectory from
Dialogue (TfD), with a 1.7x improvement over the previous state-of-the-art for
TfD. Our models, code, and video results can be found in our project's website:
https://helper-agent-llm.github.io.
Related papers
- HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models [13.963676467274109]
We extend the capabilities of HELPER by expanding its memory with a wider array of examples and prompts.
This simple expansion of HELPER into a shared memory enables the agent to work across domains executing plans from dialogue, natural language instruction, active question asking, and common room reorganization.
We evaluate the agent on four diverse interactive visual-language embodied agent: AChRED, TEA, DialFRED, and the Tidy Task.
arXiv Detail & Related papers (2024-04-29T19:12:42Z) - Apollonion: Profile-centric Dialog Agent [9.657755354649048]
We propose a framework for dialog agent to incorporate user profiling (initialization, update): user's query and response is analyzed and organized into a structural user profile.
We propose a series of evaluation protocols for personalization: to what extend the response is personal to the different users.
arXiv Detail & Related papers (2024-04-10T03:32:41Z) - Interpreting User Requests in the Context of Natural Language Standing
Instructions [89.12540932734476]
We develop NLSI, a language-to-program dataset consisting of over 2.4K dialogues spanning 17 domains.
A key challenge in NLSI is to identify which subset of the standing instructions is applicable to a given dialogue.
arXiv Detail & Related papers (2023-11-16T11:19:26Z) - Instruction-Following Speech Recognition [21.591086644665197]
We introduce instruction-following speech recognition, training a Listen-Attend-Spell model to understand and execute a diverse set of free-form text instructions.
Remarkably, our model, trained from scratch on Librispeech, interprets and executes simple instructions without requiring Large Language Models or pre-trained speech modules.
arXiv Detail & Related papers (2023-09-18T14:59:10Z) - Recursively Summarizing Enables Long-Term Dialogue Memory in Large
Language Models [75.98775135321355]
Given a long conversation, large language models (LLMs) fail to recall past information and tend to generate inconsistent responses.
We propose to generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability.
arXiv Detail & Related papers (2023-08-29T04:59:53Z) - AmadeusGPT: a natural language interface for interactive animal
behavioral analysis [65.55906175884748]
We introduce AmadeusGPT: a natural language interface that turns natural language descriptions of behaviors into machine-executable code.
We show we can produce state-of-the-art performance on the MABE 2022 behavior challenge tasks.
AmadeusGPT presents a novel way to merge deep biological knowledge, large-language models, and core computer vision modules into a more naturally intelligent system.
arXiv Detail & Related papers (2023-07-10T19:15:17Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - In-Context Learning for Few-Shot Dialogue State Tracking [55.91832381893181]
We propose an in-context (IC) learning framework for few-shot dialogue state tracking (DST)
A large pre-trained language model (LM) takes a test instance and a few annotated examples as input, and directly decodes the dialogue states without any parameter updates.
This makes the LM more flexible and scalable compared to prior few-shot DST work when adapting to new domains and scenarios.
arXiv Detail & Related papers (2022-03-16T11:58:24Z) - Language Models as Zero-Shot Planners: Extracting Actionable Knowledge
for Embodied Agents [111.33545170562337]
We investigate the possibility of grounding high-level tasks, expressed in natural language, to a chosen set of actionable steps.
We find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans.
We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions.
arXiv Detail & Related papers (2022-01-18T18:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.