Statler: State-Maintaining Language Models for Embodied Reasoning
- URL: http://arxiv.org/abs/2306.17840v4
- Date: Mon, 20 May 2024 05:57:39 GMT
- Title: Statler: State-Maintaining Language Models for Embodied Reasoning
- Authors: Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, Matthew R. Walter,
- Abstract summary: We propose Statler, a framework in which large language models are prompted to maintain an estimate of the world state.
Our framework then conditions each action on the estimate of the current world state.
It significantly outperforms strong competing methods on several robot planning tasks.
- Score: 19.884696137429813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been a significant research interest in employing large language models to empower intelligent robots with complex reasoning. Existing work focuses on harnessing their abilities to reason about the histories of their actions and observations. In this paper, we explore a new dimension in which large language models may benefit robotics planning. In particular, we propose Statler, a framework in which large language models are prompted to maintain an estimate of the world state, which are often unobservable, and track its transition as new actions are taken. Our framework then conditions each action on the estimate of the current world state. Despite being conceptually simple, our Statler framework significantly outperforms strong competing methods (e.g., Code-as-Policies) on several robot planning tasks. Additionally, it has the potential advantage of scaling up to more challenging long-horizon planning tasks.
Related papers
- LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset [0.0]
We present PARADISE, an abductive reasoning task using Q&A format on practical procedural text sourced from wikiHow.
It involves warning and tip inference tasks directly associated with goals, excluding intermediary steps, with the aim of testing the ability of the models to infer implicit knowledge of the plan solely from the given goal.
Our experiments, utilizing fine-tuned language models and zero-shot prompting, reveal the effectiveness of task-specific small models over large language models in most scenarios.
arXiv Detail & Related papers (2024-03-05T18:01:59Z) - Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction [11.614036749291216]
We introduce a new decentralized multi-robot planner called S-ATLAS for Safe plAnning for Teams of Language-instructed AgentS.
We show that the proposed planner can achieve user-specified task success rates while minimizing the overall number of help requests.
arXiv Detail & Related papers (2024-02-23T15:02:44Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Language Models, Agent Models, and World Models: The LAW for Machine
Reasoning and Planning [33.573628038590634]
We present a new perspective of machine reasoning, LAW, that connects the concepts of Language models, Agent models, and World models.
World and agent models are a better abstraction of reasoning, that introduces the crucial elements of deliberate human-like reasoning.
arXiv Detail & Related papers (2023-12-08T18:25:22Z) - CoPAL: Corrective Planning of Robot Actions with Large Language Models [8.209152055117283]
We propose a system architecture that orchestrates a seamless interplay between cognitive levels, encompassing reasoning, planning, and motion generation.
At its core lies a novel replanning strategy that handles physically grounded, logical, and semantic errors in the generated plans.
arXiv Detail & Related papers (2023-10-11T07:39:42Z) - PlaSma: Making Small Language Models Better Procedural Knowledge Models
for (Counterfactual) Planning [72.0564921186518]
PlaSma is a novel two-pronged approach to endow small language models with procedural knowledge and (counterfactual) planning capabilities.
More concretely, we develop symbolic procedural knowledge distillation to enhance the implicit knowledge in small language models.
In addition, we introduce a novel task, Counterfactual Planning, that requires a revision of a plan to cope with a counterfactual situation.
arXiv Detail & Related papers (2023-05-31T00:55:40Z) - PaLM-E: An Embodied Multimodal Language Model [101.29116156731762]
We propose embodied language models to incorporate real-world continuous sensor modalities into language models.
We train these encodings end-to-end, in conjunction with a pre-trained large language model, for multiple embodied tasks.
Our largest model, PaLM-E-562B with 562B parameters, is a visual-language generalist with state-of-the-art performance on OK-VQA.
arXiv Detail & Related papers (2023-03-06T18:58:06Z) - Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale.
Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions.
We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.