Guiding Pretraining in Reinforcement Learning with Large Language Models
- URL: http://arxiv.org/abs/2302.06692v2
- Date: Fri, 15 Sep 2023 02:42:40 GMT
- Title: Guiding Pretraining in Reinforcement Learning with Large Language Models
- Authors: Yuqing Du, Olivia Watkins, Zihan Wang, C\'edric Colas, Trevor Darrell,
Pieter Abbeel, Abhishek Gupta, Jacob Andreas
- Abstract summary: We describe a method that uses background knowledge from text corpora to shape exploration.
This method, called ELLM, rewards an agent for achieving goals suggested by a language model.
By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop.
- Score: 133.32146904055233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning algorithms typically struggle in the absence of a
dense, well-shaped reward function. Intrinsically motivated exploration methods
address this limitation by rewarding agents for visiting novel states or
transitions, but these methods offer limited benefits in large environments
where most discovered novelty is irrelevant for downstream tasks. We describe a
method that uses background knowledge from text corpora to shape exploration.
This method, called ELLM (Exploring with LLMs) rewards an agent for achieving
goals suggested by a language model prompted with a description of the agent's
current state. By leveraging large-scale language model pretraining, ELLM
guides agents toward human-meaningful and plausibly useful behaviors without
requiring a human in the loop. We evaluate ELLM in the Crafter game environment
and the Housekeep robotic simulator, showing that ELLM-trained agents have
better coverage of common-sense behaviors during pretraining and usually match
or improve performance on a range of downstream tasks. Code available at
https://github.com/yuqingd/ellm.
Related papers
- zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
Large language models (LLMs) have the capability of zero-shot learning, which does not require training or fine-tuning.
We propose zsLLMCode, a novel approach that generates functional code embeddings using LLMs.
arXiv Detail & Related papers (2024-09-23T01:03:15Z) - StateAct: State Tracking and Reasoning for Acting and Planning with Large Language Models [10.359008237358603]
Planning and acting to solve real' tasks using large language models (LLMs) in interactive environments has become a new frontier for AI methods.
We propose a simple method based on few-shot in-context learning alone to enhance chain-of-thought' with state-tracking for planning and acting with LLMs.
arXiv Detail & Related papers (2024-09-21T05:54:35Z) - EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments.
We train a small RL agent in a mixture of the original and LLM-generated environments.
We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z) - Large Language Models as Generalizable Policies for Embodied Tasks [50.870491905776305]
We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks.
Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment.
arXiv Detail & Related papers (2023-10-26T18:32:05Z) - Language Reward Modulation for Pretraining Reinforcement Learning [61.76572261146311]
We propose leveraging the capabilities of LRFs as a pretraining signal for reinforcement learning.
Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks.
arXiv Detail & Related papers (2023-08-23T17:37:51Z) - Selective Perception: Optimizing State Descriptions with Reinforcement
Learning for Language Model Actors [40.18762220245365]
Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games.
Previous work does little to explore what environment state information is provided to LLM actors via language.
We propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions.
arXiv Detail & Related papers (2023-07-21T22:02:50Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Language Models as Zero-Shot Planners: Extracting Actionable Knowledge
for Embodied Agents [111.33545170562337]
We investigate the possibility of grounding high-level tasks, expressed in natural language, to a chosen set of actionable steps.
We find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans.
We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions.
arXiv Detail & Related papers (2022-01-18T18:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.