Navigation with Large Language Models: Semantic Guesswork as a Heuristic
for Planning
- URL: http://arxiv.org/abs/2310.10103v1
- Date: Mon, 16 Oct 2023 06:21:06 GMT
- Title: Navigation with Large Language Models: Semantic Guesswork as a Heuristic
for Planning
- Authors: Dhruv Shah, Michael Equi, Blazej Osinski, Fei Xia, Brian Ichter,
Sergey Levine
- Abstract summary: Navigation in unfamiliar environments presents a major challenge for robots.
We use language models to bias exploration of novel real-world environments.
We evaluate LFG in challenging real-world environments and simulated benchmarks.
- Score: 73.0990339667978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Navigation in unfamiliar environments presents a major challenge for robots:
while mapping and planning techniques can be used to build up a representation
of the world, quickly discovering a path to a desired goal in unfamiliar
settings with such methods often requires lengthy mapping and exploration.
Humans can rapidly navigate new environments, particularly indoor environments
that are laid out logically, by leveraging semantics -- e.g., a kitchen often
adjoins a living room, an exit sign indicates the way out, and so forth.
Language models can provide robots with such knowledge, but directly using
language models to instruct a robot how to reach some destination can also be
impractical: while language models might produce a narrative about how to reach
some goal, because they are not grounded in real-world observations, this
narrative might be arbitrarily wrong. Therefore, in this paper we study how the
``semantic guesswork'' produced by language models can be utilized as a guiding
heuristic for planning algorithms. Our method, Language Frontier Guide (LFG),
uses the language model to bias exploration of novel real-world environments by
incorporating the semantic knowledge stored in language models as a search
heuristic for planning with either topological or metric maps. We evaluate LFG
in challenging real-world environments and simulated benchmarks, outperforming
uninformed exploration and other ways of using language models.
Related papers
- How language models extrapolate outside the training data: A case study in Textualized Gridworld [32.5268320198854]
We show that conventional approaches, including next-token prediction and Chain of Thought fine-tuning, fail to generalize in larger, unseen environments.
Inspired by human cognition and dual-process theory, we propose language models should construct cognitive maps before interaction.
arXiv Detail & Related papers (2024-06-21T16:10:05Z) - From Word Models to World Models: Translating from Natural Language to
the Probabilistic Language of Thought [124.40905824051079]
We propose rational meaning construction, a computational framework for language-informed thinking.
We frame linguistic meaning as a context-sensitive mapping from natural language into a probabilistic language of thought.
We show that LLMs can generate context-sensitive translations that capture pragmatically-appropriate linguistic meanings.
We extend our framework to integrate cognitively-motivated symbolic modules.
arXiv Detail & Related papers (2023-06-22T05:14:00Z) - Grounded Decoding: Guiding Text Generation with Grounded Models for
Embodied Agents [111.15288256221764]
Grounded-decoding project aims to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives.
We demonstrate how such grounded models can be obtained across three simulation and real-world domains, and that the proposed decoding strategy is able to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
arXiv Detail & Related papers (2023-03-01T22:58:50Z) - Inner Monologue: Embodied Reasoning through Planning with Language
Models [81.07216635735571]
Large Language Models (LLMs) can be applied to domains beyond natural language processing.
LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them.
We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios.
arXiv Detail & Related papers (2022-07-12T15:20:48Z) - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [119.29555551279155]
Large language models can encode a wealth of semantic knowledge about the world.
Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language.
We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions.
arXiv Detail & Related papers (2022-04-04T17:57:11Z) - Language Understanding for Field and Service Robots in a Priori Unknown
Environments [29.16936249846063]
This paper provides a novel learning framework that allows field and service robots to interpret and execute natural language instructions.
We use language as a "sensor" -- inferring spatial, topological, and semantic information implicit in natural language utterances.
We incorporate this distribution in a probabilistic language grounding model and infer a distribution over a symbolic representation of the robot's action space.
arXiv Detail & Related papers (2021-05-21T15:13:05Z) - ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals.
We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.
We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z) - Deep compositional robotic planners that follow natural language
commands [21.481360281719006]
We show how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands.
Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes.
arXiv Detail & Related papers (2020-02-12T19:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.