Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making
using Language Guided World Modelling
- URL: http://arxiv.org/abs/2301.12050v2
- Date: Thu, 27 Apr 2023 15:14:01 GMT
- Title: Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making
using Language Guided World Modelling
- Authors: Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi,
Hannaneh Hajishirzi, Sameer Singh, Roy Fox
- Abstract summary: Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world.
We propose using few-shot large language models (LLMs) to hypothesize an Abstract World Model (AWM)
Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude.
- Score: 101.59430768507997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) agents typically learn tabula rasa, without prior
knowledge of the world. However, if initialized with knowledge of high-level
subgoals and transitions between subgoals, RL agents could utilize this
Abstract World Model (AWM) for planning and exploration. We propose using
few-shot large language models (LLMs) to hypothesize an AWM, that will be
verified through world experience, to improve sample efficiency of RL agents.
Our DECKARD agent applies LLM-guided exploration to item crafting in Minecraft
in two phases: (1) the Dream phase where the agent uses an LLM to decompose a
task into a sequence of subgoals, the hypothesized AWM; and (2) the Wake phase
where the agent learns a modular policy for each subgoal and verifies or
corrects the hypothesized AWM. Our method of hypothesizing an AWM with LLMs and
then verifying the AWM based on agent experience not only increases sample
efficiency over contemporary methods by an order of magnitude but is also
robust to and corrects errors in the LLM, successfully blending noisy
internet-scale information from LLMs with knowledge grounded in environment
dynamics.
Related papers
- WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents [55.64361927346957]
We propose a neurosymbolic approach to learn rules gradient-free through large language models (LLMs)
Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC)
On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods.
arXiv Detail & Related papers (2024-10-09T23:37:36Z) - Controlling Large Language Model Agents with Entropic Activation Steering [20.56909601159833]
We introduce Entropic Activation Steering (EAST), an activation steering method for in-context learning agents.
We show that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM.
We also reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions.
arXiv Detail & Related papers (2024-06-01T00:25:00Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments.
We train a small RL agent in a mixture of the original and LLM-generated environments.
We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - True Knowledge Comes from Practice: Aligning LLMs with Embodied
Environments via Reinforcement Learning [37.10401435242991]
Large language models (LLMs) often fail in solving simple decision-making tasks due to misalignment of the knowledge in LLMs with environments.
We propose TWOSOME, a novel framework that deploys LLMs as decision-making agents to efficiently interact and align with embodied environments via RL.
arXiv Detail & Related papers (2024-01-25T13:03:20Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - LgTS: Dynamic Task Sampling using LLM-generated sub-goals for
Reinforcement Learning Agents [10.936460061405157]
We propose LgTS (LLM-guided Teacher-Student learning), a novel approach that explores the planning abilities of LLMs.
Our approach does not assume access to a propreitary or a fine-tuned LLM, nor does it require pre-trained policies that achieve the sub-goals proposed by the LLM.
arXiv Detail & Related papers (2023-10-14T00:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.