Guiding Pretraining in Reinforcement Learning with Large Language Models
- URL: http://arxiv.org/abs/2302.06692v2
- Date: Fri, 15 Sep 2023 02:42:40 GMT
- Title: Guiding Pretraining in Reinforcement Learning with Large Language Models
- Authors: Yuqing Du, Olivia Watkins, Zihan Wang, C\'edric Colas, Trevor Darrell,
Pieter Abbeel, Abhishek Gupta, Jacob Andreas
- Abstract summary: We describe a method that uses background knowledge from text corpora to shape exploration.
This method, called ELLM, rewards an agent for achieving goals suggested by a language model.
By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop.
- Score: 133.32146904055233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning algorithms typically struggle in the absence of a
dense, well-shaped reward function. Intrinsically motivated exploration methods
address this limitation by rewarding agents for visiting novel states or
transitions, but these methods offer limited benefits in large environments
where most discovered novelty is irrelevant for downstream tasks. We describe a
method that uses background knowledge from text corpora to shape exploration.
This method, called ELLM (Exploring with LLMs) rewards an agent for achieving
goals suggested by a language model prompted with a description of the agent's
current state. By leveraging large-scale language model pretraining, ELLM
guides agents toward human-meaningful and plausibly useful behaviors without
requiring a human in the loop. We evaluate ELLM in the Crafter game environment
and the Housekeep robotic simulator, showing that ELLM-trained agents have
better coverage of common-sense behaviors during pretraining and usually match
or improve performance on a range of downstream tasks. Code available at
https://github.com/yuqingd/ellm.
Related papers
- Should You Use Your Large Language Model to Explore or Exploit? [55.562545113247666]
We evaluate the ability of large language models to help a decision-making agent facing an exploration-exploitation tradeoff.
We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks.
arXiv Detail & Related papers (2025-01-31T23:42:53Z) - Training Agents with Weakly Supervised Feedback from Large Language Models [19.216542820742607]
This paper introduces a novel training method for LLM-based agents using weakly supervised signals from a critic LLM.
Our agents are trained in iterative manner, where they initially generate trajectories through environmental interaction.
Tests on the API-bank dataset show consistent improvement in our agents' capabilities and comparable performance to GPT-4.
arXiv Detail & Related papers (2024-11-29T08:47:04Z) - StateAct: State Tracking and Reasoning for Acting and Planning with Large Language Models [10.359008237358603]
Planning and acting to solve real' tasks using large language models (LLMs) in interactive environments has become a new frontier for AI methods.
We propose a simple method based on few-shot in-context learning alone to enhance chain-of-thought' with state-tracking.
We show that our method establishes the new state-of-the-art on Alfworld for in-context learning methods.
arXiv Detail & Related papers (2024-09-21T05:54:35Z) - EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments.
We train a small RL agent in a mixture of the original and LLM-generated environments.
We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z) - Language Reward Modulation for Pretraining Reinforcement Learning [61.76572261146311]
We propose leveraging the capabilities of LRFs as a pretraining signal for reinforcement learning.
Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks.
arXiv Detail & Related papers (2023-08-23T17:37:51Z) - Selective Perception: Optimizing State Descriptions with Reinforcement
Learning for Language Model Actors [40.18762220245365]
Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games.
Previous work does little to explore what environment state information is provided to LLM actors via language.
We propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions.
arXiv Detail & Related papers (2023-07-21T22:02:50Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Language Models as Zero-Shot Planners: Extracting Actionable Knowledge
for Embodied Agents [111.33545170562337]
We investigate the possibility of grounding high-level tasks, expressed in natural language, to a chosen set of actionable steps.
We find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans.
We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions.
arXiv Detail & Related papers (2022-01-18T18:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.