Language Models as Zero-Shot Planners: Extracting Actionable Knowledge
for Embodied Agents
- URL: http://arxiv.org/abs/2201.07207v1
- Date: Tue, 18 Jan 2022 18:59:45 GMT
- Title: Language Models as Zero-Shot Planners: Extracting Actionable Knowledge
for Embodied Agents
- Authors: Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch
- Abstract summary: We investigate the possibility of grounding high-level tasks, expressed in natural language, to a chosen set of actionable steps.
We find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans.
We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions.
- Score: 111.33545170562337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Can world knowledge learned by large language models (LLMs) be used to act in
interactive environments? In this paper, we investigate the possibility of
grounding high-level tasks, expressed in natural language (e.g. "make
breakfast"), to a chosen set of actionable steps (e.g. "open fridge"). While
prior work focused on learning from explicit step-by-step examples of how to
act, we surprisingly find that if pre-trained LMs are large enough and prompted
appropriately, they can effectively decompose high-level tasks into low-level
plans without any further training. However, the plans produced naively by LLMs
often cannot map precisely to admissible actions. We propose a procedure that
conditions on existing demonstrations and semantically translates the plans to
admissible actions. Our evaluation in the recent VirtualHome environment shows
that the resulting method substantially improves executability over the LLM
baseline. The conducted human evaluation reveals a trade-off between
executability and correctness but shows a promising sign towards extracting
actionable knowledge from language models. Website at
https://huangwl18.github.io/language-planner
Related papers
- From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control [58.72492647570062]
We introduce our method -- Learnable Latent Codes as Bridges (LCB) -- as an alternate architecture to overcome limitations.
We find that methodoutperforms baselines that leverage pure language as the interface layer on tasks that require reasoning and multi-step behaviors.
arXiv Detail & Related papers (2024-05-08T04:14:06Z) - Grounding Language Plans in Demonstrations Through Counterfactual Perturbations [25.19071357445557]
Grounding the common-sense reasoning of Large Language Models (LLMs) in physical domains remains a pivotal yet unsolved problem for embodied AI.
We show our approach improves the interpretability and reactivity of imitation learning through 2D navigation and simulated and real robot manipulation tasks.
arXiv Detail & Related papers (2024-03-25T19:04:59Z) - Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs [7.746160514029531]
We demonstrate experimental results with LLMs that address robotics task planning problems.
Our approach acquires text descriptions of the task and scene objects, then formulates task planning through natural language reasoning.
Our approach is evaluated on a multi-modal prompt simulation benchmark.
arXiv Detail & Related papers (2024-03-20T17:58:12Z) - Few-Shot Cross-Lingual Transfer for Prompting Large Language Models in
Low-Resource Languages [0.0]
"prompting" is where a user provides a description of a task and some completed examples of the task to a PLM as context before prompting the PLM to perform the task on a new example.
We consider three methods: few-shot prompting (prompt), language-adaptive fine-tuning (LAFT), and neural machine translation (translate)
We find that translate and prompt settings are a compute-efficient and cost-effective method of few-shot prompting for the selected low-resource languages.
arXiv Detail & Related papers (2024-03-09T21:36:13Z) - Selective Perception: Optimizing State Descriptions with Reinforcement
Learning for Language Model Actors [40.18762220245365]
Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games.
Previous work does little to explore what environment state information is provided to LLM actors via language.
We propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions.
arXiv Detail & Related papers (2023-07-21T22:02:50Z) - Guiding Pretraining in Reinforcement Learning with Large Language Models [133.32146904055233]
We describe a method that uses background knowledge from text corpora to shape exploration.
This method, called ELLM, rewards an agent for achieving goals suggested by a language model.
By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop.
arXiv Detail & Related papers (2023-02-13T21:16:03Z) - Translating Natural Language to Planning Goals with Large-Language
Models [19.738395237639136]
Recent large language models (LLMs) have demonstrated remarkable performance on a variety of natural language processing (NLP) tasks.
Our central question is whether LLMs are able to translate goals specified in natural language to a structured planning language.
Our empirical results on GPT 3.5 variants show that LLMs are much better suited towards translation rather than planning.
arXiv Detail & Related papers (2023-02-10T09:17:52Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - Inner Monologue: Embodied Reasoning through Planning with Language
Models [81.07216635735571]
Large Language Models (LLMs) can be applied to domains beyond natural language processing.
LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them.
We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios.
arXiv Detail & Related papers (2022-07-12T15:20:48Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.