Related papers: Statler: State-Maintaining Language Models for Embodied Reasoning

Statler: State-Maintaining Language Models for Embodied Reasoning

URL: http://arxiv.org/abs/2306.17840v4
Date: Mon, 20 May 2024 05:57:39 GMT
Title: Statler: State-Maintaining Language Models for Embodied Reasoning
Authors: Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, Matthew R. Walter,
Abstract summary: We propose Statler, a framework in which large language models are prompted to maintain an estimate of the world state. Our framework then conditions each action on the estimate of the current world state. It significantly outperforms strong competing methods on several robot planning tasks.
Score: 19.884696137429813
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There has been a significant research interest in employing large language models to empower intelligent robots with complex reasoning. Existing work focuses on harnessing their abilities to reason about the histories of their actions and observations. In this paper, we explore a new dimension in which large language models may benefit robotics planning. In particular, we propose Statler, a framework in which large language models are prompted to maintain an estimate of the world state, which are often unobservable, and track its transition as new actions are taken. Our framework then conditions each action on the estimate of the current world state. Despite being conceptually simple, our Statler framework significantly outperforms strong competing methods (e.g., Code-as-Policies) on several robot planning tasks. Additionally, it has the potential advantage of scaling up to more challenging long-horizon planning tasks.

Related papers

Object-Centric World Model for Language-Guided Manipulation [4.008780119020479]
A world model is essential for an agent to predict the future and plan in domains such as autonomous driving and robotics. We propose a world model leveraging object-centric representation space using slot attention, guided by language instructions. Our model perceives the current state as an object-centric representation and predicts future states in this representation space conditioned on natural language instructions.
arXiv Detail & Related papers (2025-03-08T11:17:37Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
Grounding Language Models in Autonomous Loco-manipulation Tasks [3.8363685417355557]
We propose a novel framework that learns, selects, and plans behaviors based on tasks in different scenarios. We leverage the planning and reasoning features of the large language model (LLM), constructing a hierarchical task graph. Experiments in simulation and real-world using the CENTAURO robot show that the language model based planner can efficiently adapt to new loco-manipulation tasks.
arXiv Detail & Related papers (2024-09-02T15:27:48Z)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer. We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z)
Probabilistically Correct Language-based Multi-Robot Planning using Conformal Prediction [11.614036749291216]
We introduce a new distributed multi-robot planner called S-ATLAS for Safe plAnning for Teams of Language-instructed AgentS. We show that the proposed planner can achieve user-specified task success rates, assuming successful plan execution. We provide comparative experiments against related works showing that our method is significantly more computational efficient and achieves lower help rates.
arXiv Detail & Related papers (2024-02-23T15:02:44Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning [33.573628038590634]
We present a new perspective of machine reasoning, LAW, that connects the concepts of Language models, Agent models, and World models. World and agent models are a better abstraction of reasoning, that introduces the crucial elements of deliberate human-like reasoning.
arXiv Detail & Related papers (2023-12-08T18:25:22Z)
CoPAL: Corrective Planning of Robot Actions with Large Language Models [8.209152055117283]
We propose a system architecture that orchestrates a seamless interplay between cognitive levels, encompassing reasoning, planning, and motion generation. At its core lies a novel replanning strategy that handles physically grounded, logical, and semantic errors in the generated plans.
arXiv Detail & Related papers (2023-10-11T07:39:42Z)
PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning [77.03847056008598]
PlaSma is a novel two-pronged approach to endow small language models with procedural knowledge and (constrained) language planning capabilities. We develop symbolic procedural knowledge distillation to enhance the commonsense knowledge in small language models and an inference-time algorithm to facilitate more structured and accurate reasoning.
arXiv Detail & Related papers (2023-05-31T00:55:40Z)
PaLM-E: An Embodied Multimodal Language Model [101.29116156731762]
We propose embodied language models to incorporate real-world continuous sensor modalities into language models. We train these encodings end-to-end, in conjunction with a pre-trained large language model, for multiple embodied tasks. Our largest model, PaLM-E-562B with 562B parameters, is a visual-language generalist with state-of-the-art performance on OK-VQA.
arXiv Detail & Related papers (2023-03-06T18:58:06Z)
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.