Language Reward Modulation for Pretraining Reinforcement Learning
- URL: http://arxiv.org/abs/2308.12270v1
- Date: Wed, 23 Aug 2023 17:37:51 GMT
- Title: Language Reward Modulation for Pretraining Reinforcement Learning
- Authors: Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen
James, Pieter Abbeel
- Abstract summary: We propose leveraging the capabilities of LRFs as a pretraining signal for reinforcement learning.
Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks.
- Score: 61.76572261146311
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Using learned reward functions (LRFs) as a means to solve sparse-reward
reinforcement learning (RL) tasks has yielded some steady progress in
task-complexity through the years. In this work, we question whether today's
LRFs are best-suited as a direct replacement for task rewards. Instead, we
propose leveraging the capabilities of LRFs as a pretraining signal for RL.
Concretely, we propose $\textbf{LA}$nguage Reward $\textbf{M}$odulated
$\textbf{P}$retraining (LAMP) which leverages the zero-shot capabilities of
Vision-Language Models (VLMs) as a $\textit{pretraining}$ utility for RL as
opposed to a downstream task reward. LAMP uses a frozen, pretrained VLM to
scalably generate noisy, albeit shaped exploration rewards by computing the
contrastive alignment between a highly diverse collection of language
instructions and the image observations of an agent in its pretraining
environment. LAMP optimizes these rewards in conjunction with standard
novelty-seeking exploration rewards with reinforcement learning to acquire a
language-conditioned, pretrained policy. Our VLM pretraining approach, which is
a departure from previous attempts to use LRFs, can warmstart sample-efficient
learning on robot manipulation tasks in RLBench.
Related papers
- Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning [18.60627708199452]
We investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL)
We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks.
We introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL)
arXiv Detail & Related papers (2024-06-02T07:20:08Z) - Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning [49.87923965553233]
Reinforcement Learning can lead to reward over-optimization in large language models.
We introduce the Reward from Demonstration (RCfD) to recalibrate the reward objective.
We show that RCfD achieves comparable performance to carefully tuned baselines while mitigating ROO.
arXiv Detail & Related papers (2024-04-30T09:57:21Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Code as Reward: Empowering Reinforcement Learning with VLMs [37.862999288331906]
We propose a framework named Code as Reward (VLM-CaR) to produce dense reward functions from pre-trained Vision-Language Models.
VLM-CaR significantly reduces the computational burden of querying the VLM directly.
We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments.
arXiv Detail & Related papers (2024-02-07T11:27:45Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.