Related papers: Guiding Pretraining in Reinforcement Learning with Large Language Models

Guiding Pretraining in Reinforcement Learning with Large Language Models

URL: http://arxiv.org/abs/2302.06692v2
Date: Fri, 15 Sep 2023 02:42:40 GMT
Title: Guiding Pretraining in Reinforcement Learning with Large Language Models
Authors: Yuqing Du, Olivia Watkins, Zihan Wang, C\'edric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas
Abstract summary: We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM, rewards an agent for achieving goals suggested by a language model. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop.
Score: 133.32146904055233
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent's current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks. Code available at https://github.com/yuqingd/ellm.

Related papers

Beyond Syntax: Action Semantics Learning for App Agents [60.56331102288794]
Action Semantics Learning (ASL) is a learning framework where the learning objective is capturing the semantics of the ground truth actions.<n>ASL significantly improves the accuracy and generalisation of App agents over existing methods.
arXiv Detail & Related papers (2025-06-21T12:08:19Z)
GoalLadder: Incremental Goal Discovery with Vision-Language Models [38.35578010611503]
We propose a novel method to train RL agents from a single language instruction in visual environments.<n>GoalLadder works by incrementally discovering states that bring the agent closer to completing a task specified in natural language.<n>Unlike prior work, GoalLadder does not trust VLM's feedback completely; instead, it uses it to rank potential goal states using an ELO-based rating system.
arXiv Detail & Related papers (2025-06-19T15:28:27Z)
MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability [106.35604230971396]
Recent advancements in Agent techniques enable Large Language Models (LLMs) to autonomously utilize tools for retrieval, planning, and reasoning.<n>To further enhance the universal search capability of agents, we propose a novel pre-training framework, MaskSearch.<n>In the pre-training stage, we introduce the Retrieval Augmented Mask Prediction (RAMP) task, where the model learns to leverage search tools to fill masked spans.<n>After that, the model is trained on downstream tasks to achieve further improvement.
arXiv Detail & Related papers (2025-05-26T17:58:50Z)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Should You Use Your Large Language Model to Explore or Exploit? [55.562545113247666]
We evaluate the ability of large language models to help a decision-making agent facing an exploration-exploitation tradeoff. We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks.
arXiv Detail & Related papers (2025-01-31T23:42:53Z)
Training Agents with Weakly Supervised Feedback from Large Language Models [19.216542820742607]
This paper introduces a novel training method for LLM-based agents using weakly supervised signals from a critic LLM. Our agents are trained in iterative manner, where they initially generate trajectories through environmental interaction. Tests on the API-bank dataset show consistent improvement in our agents' capabilities and comparable performance to GPT-4.
arXiv Detail & Related papers (2024-11-29T08:47:04Z)
zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
Large language models (LLMs) have the capability of zero-shot learning, which does not require training or fine-tuning. We propose zsLLMCode, a novel approach that generates functional code embeddings using LLMs.
arXiv Detail & Related papers (2024-09-23T01:03:15Z)
StateAct: State Tracking and Reasoning for Acting and Planning with Large Language Models [10.359008237358603]
Planning and acting to solve real' tasks using large language models (LLMs) in interactive environments has become a new frontier for AI methods. We propose a simple method based on few-shot in-context learning alone to enhance chain-of-thought' with state-tracking for planning and acting with LLMs.
arXiv Detail & Related papers (2024-09-21T05:54:35Z)
Sparse Rewards Can Self-Train Dialogue Agents [22.799506097310008]
We introduce a novel self-improvement paradigm that empowers LLM agents to autonomously enhance their performance without external human feedback. We present ToolWOZ, a sparse reward tool-calling simulation environment derived from MultiWOZ. We demonstrate that models trained with JOSH, both small and frontier, significantly improve tool-based interactions while preserving general model capabilities across diverse benchmarks.
arXiv Detail & Related papers (2024-09-06T21:00:57Z)
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments. We train a small RL agent in a mixture of the original and LLM-generated environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z)
Large Language Models as Generalizable Policies for Embodied Tasks [50.870491905776305]
We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment.
arXiv Detail & Related papers (2023-10-26T18:32:05Z)
Language Reward Modulation for Pretraining Reinforcement Learning [61.76572261146311]
We propose leveraging the capabilities of LRFs as a pretraining signal for reinforcement learning. Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks.
arXiv Detail & Related papers (2023-08-23T17:37:51Z)
Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors [40.18762220245365]
Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games. Previous work does little to explore what environment state information is provided to LLM actors via language. We propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions.
arXiv Detail & Related papers (2023-07-21T22:02:50Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents [111.33545170562337]
We investigate the possibility of grounding high-level tasks, expressed in natural language, to a chosen set of actionable steps. We find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions.
arXiv Detail & Related papers (2022-01-18T18:59:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.