A Reminder of its Brittleness: Language Reward Shaping May Hinder
Learning for Instruction Following Agents
- URL: http://arxiv.org/abs/2305.16621v2
- Date: Thu, 17 Aug 2023 06:11:14 GMT
- Title: A Reminder of its Brittleness: Language Reward Shaping May Hinder
Learning for Instruction Following Agents
- Authors: Sukai Huang, Nir Lipovetzky and Trevor Cohn
- Abstract summary: We argue that the apparent success of LRS is brittle, and prior positive findings can be attributed to weak RL baselines.
We provided theoretical and empirical evidence that agents trained using LRS rewards converge more slowly compared to pure RL agents.
- Score: 38.928166383780535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Teaching agents to follow complex written instructions has been an important
yet elusive goal. One technique for enhancing learning efficiency is language
reward shaping (LRS). Within a reinforcement learning (RL) framework, LRS
involves training a reward function that rewards behaviours precisely aligned
with given language instructions. We argue that the apparent success of LRS is
brittle, and prior positive findings can be attributed to weak RL baselines.
Specifically, we identified suboptimal LRS designs that reward partially
matched trajectories, and we characterised a novel reward perturbation to
capture this issue using the concept of loosening task constraints. We provided
theoretical and empirical evidence that agents trained using LRS rewards
converge more slowly compared to pure RL agents. Our work highlights the
brittleness of existing LRS methods, which has been overlooked in the previous
studies.
Related papers
- Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning [23.99454995087634]
We explore the potential of rule-based reinforcement learning in large reasoning models.
We use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification.
Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus.
arXiv Detail & Related papers (2025-02-20T17:49:26Z) - Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning [45.30569353687124]
We introduce LaRe, a novel LLM-empowered symbolic-based decision-making framework to improve credit assignment.
Key to LaRe is the concept of the Latent Reward, which works as a multi-dimensional performance evaluation.
LaRe achieves superior temporal credit assignment to SOTA methods, (ii) excels in allocating contributions among multiple agents, and (iii) outperforms policies trained with ground truth rewards for certain tasks.
arXiv Detail & Related papers (2024-12-15T08:51:14Z) - Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - Towards Learning Abductive Reasoning using VSA Distributed Representations [56.31867341825068]
We introduce the Abductive Rule Learner with Context-awareness (ARLC) model.
ARLC features a novel and more broadly applicable training objective for abductive reasoning.
We show ARLC's robustness to post-programming training by incrementally learning from examples on top of programmed knowledge.
arXiv Detail & Related papers (2024-06-27T12:05:55Z) - RLSF: Reinforcement Learning via Symbolic Feedback [11.407319705797242]
We propose a new fine-tuning paradigm we refer to as Reinforcement Learning via proofs Feedback (RLSF)
In RLSF, the LLM being fine-tuned is considered an RL agent, while the environment is allowed access to reasoning or domain knowledge tools.
We show that our RLSF-based fine-tuning of LLMs outperforms traditional approaches on five different applications.
arXiv Detail & Related papers (2024-05-26T18:49:59Z) - Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction [11.535892987373947]
Relation extraction (RE) aims to identify relations between entities mentioned in texts.
Large language models (LLMs) have demonstrated impressive in-context learning abilities in various tasks.
LLMs suffer from poor performances compared to most supervised fine-tuned RE methods.
arXiv Detail & Related papers (2024-04-27T07:12:52Z) - Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs)
We propose a new RL method named RLMEC that incorporates a generative model as the reward model.
Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Language Reward Modulation for Pretraining Reinforcement Learning [61.76572261146311]
We propose leveraging the capabilities of LRFs as a pretraining signal for reinforcement learning.
Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks.
arXiv Detail & Related papers (2023-08-23T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.