Related papers: Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning

Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning

URL: http://arxiv.org/abs/2310.10103v1
Date: Mon, 16 Oct 2023 06:21:06 GMT
Title: Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning
Authors: Dhruv Shah, Michael Equi, Blazej Osinski, Fei Xia, Brian Ichter, Sergey Levine
Abstract summary: Navigation in unfamiliar environments presents a major challenge for robots. We use language models to bias exploration of novel real-world environments. We evaluate LFG in challenging real-world environments and simulated benchmarks.
Score: 73.0990339667978
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Navigation in unfamiliar environments presents a major challenge for robots: while mapping and planning techniques can be used to build up a representation of the world, quickly discovering a path to a desired goal in unfamiliar settings with such methods often requires lengthy mapping and exploration. Humans can rapidly navigate new environments, particularly indoor environments that are laid out logically, by leveraging semantics -- e.g., a kitchen often adjoins a living room, an exit sign indicates the way out, and so forth. Language models can provide robots with such knowledge, but directly using language models to instruct a robot how to reach some destination can also be impractical: while language models might produce a narrative about how to reach some goal, because they are not grounded in real-world observations, this narrative might be arbitrarily wrong. Therefore, in this paper we study how the ``semantic guesswork'' produced by language models can be utilized as a guiding heuristic for planning algorithms. Our method, Language Frontier Guide (LFG), uses the language model to bias exploration of novel real-world environments by incorporating the semantic knowledge stored in language models as a search heuristic for planning with either topological or metric maps. We evaluate LFG in challenging real-world environments and simulated benchmarks, outperforming uninformed exploration and other ways of using language models.

Related papers

MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal Large Language Models [5.28115111932163]
We present MLLM-Search, a novel zero-shot person search architecture for mobile robots. Our approach introduces a novel visual prompting method to provide robots with spatial understanding of the environment. Experiments with a mobile robot in a multi-room floor of a building showed that MLLM-Search was able to generalize to finding a person in a new unseen environment.
arXiv Detail & Related papers (2024-11-27T21:59:29Z)
How language models extrapolate outside the training data: A case study in Textualized Gridworld [32.5268320198854]
We show that conventional approaches, including next-token prediction and Chain of Thought fine-tuning, fail to generalize in larger, unseen environments. Inspired by human cognition and dual-process theory, we propose language models should construct cognitive maps before interaction.
arXiv Detail & Related papers (2024-06-21T16:10:05Z)
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought [124.40905824051079]
We propose rational meaning construction, a computational framework for language-informed thinking. We frame linguistic meaning as a context-sensitive mapping from natural language into a probabilistic language of thought. We show that LLMs can generate context-sensitive translations that capture pragmatically-appropriate linguistic meanings. We extend our framework to integrate cognitively-motivated symbolic modules.
arXiv Detail & Related papers (2023-06-22T05:14:00Z)
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents [111.15288256221764]
Grounded-decoding project aims to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models. We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives. We demonstrate how such grounded models can be obtained across three simulation and real-world domains, and that the proposed decoding strategy is able to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
arXiv Detail & Related papers (2023-03-01T22:58:50Z)
Inner Monologue: Embodied Reasoning through Planning with Language Models [81.07216635735571]
Large Language Models (LLMs) can be applied to domains beyond natural language processing. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios.
arXiv Detail & Related papers (2022-07-12T15:20:48Z)
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [119.29555551279155]
Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions.
arXiv Detail & Related papers (2022-04-04T17:57:11Z)
Language Understanding for Field and Service Robots in a Priori Unknown Environments [29.16936249846063]
This paper provides a novel learning framework that allows field and service robots to interpret and execute natural language instructions. We use language as a "sensor" -- inferring spatial, topological, and semantic information implicit in natural language utterances. We incorporate this distribution in a probabilistic language grounding model and infer a distribution over a symbolic representation of the robot's action space.
arXiv Detail & Related papers (2021-05-21T15:13:05Z)
ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals. We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning. We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z)
Deep compositional robotic planners that follow natural language commands [21.481360281719006]
We show how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands. Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes.
arXiv Detail & Related papers (2020-02-12T19:56:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.