Reinforcement Learning from LLM Feedback to Counteract Goal
Misgeneralization
- URL: http://arxiv.org/abs/2401.07181v1
- Date: Sun, 14 Jan 2024 01:09:48 GMT
- Title: Reinforcement Learning from LLM Feedback to Counteract Goal
Misgeneralization
- Authors: Houda Nait El Barj, Theophile Sautory
- Abstract summary: We introduce a method to address goal misgeneralization in reinforcement learning (RL)
Goal misgeneralization occurs when an agent retains its capabilities out-of-distribution yet pursues a proxy rather than the intended one.
This study demonstrates how the Large Language Model can efficiently supervise RL agents.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a method to address goal misgeneralization in reinforcement
learning (RL), leveraging Large Language Model (LLM) feedback during training.
Goal misgeneralization, a type of robustness failure in RL occurs when an agent
retains its capabilities out-of-distribution yet pursues a proxy rather than
the intended one. Our approach utilizes LLMs to analyze an RL agent's policies
during training and identify potential failure scenarios. The RL agent is then
deployed in these scenarios, and a reward model is learnt through the LLM
preferences and feedback. This LLM-informed reward model is used to further
train the RL agent on the original dataset. We apply our method to a maze
navigation task, and show marked improvements in goal generalization,
especially in cases where true and proxy goals are somewhat distinguishable and
behavioral biases are pronounced. This study demonstrates how the LLM, despite
its lack of task proficiency, can efficiently supervise RL agents, providing
scalable oversight and valuable insights for enhancing goal-directed learning
in RL through the use of LLMs.
Related papers
- Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL [7.988692259455583]
Large language models (LLMs) trained with Reinforcement Learning from Human Feedback have demonstrated remarkable capabilities, but their underlying reward functions and decision-making processes remain opaque.
This paper introduces a novel approach to interpreting LLMs by applying inverse reinforcement learning (IRL) to recover their implicit reward functions.
We conduct experiments on toxicity-aligned LLMs of varying sizes, extracting reward models that achieve up to 80.40% accuracy in predicting human preferences.
arXiv Detail & Related papers (2024-10-16T12:14:25Z) - SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling [29.29604779151457]
This paper presents and studies an adaptation of Soft Actor-Critic and hindsight relabeling to LLM agents.
Our method paves the path towards autotelic LLM agents that learn online but can also outperform on-policy methods in more classic multi-goal RL environments.
arXiv Detail & Related papers (2024-10-16T11:59:27Z) - VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates.
We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z) - Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL [14.091146805312636]
Credit assignment problem is a central challenge in Reinforcement Learning (RL)
Credit Assignment with Language Models (CALM) is a novel approach to automate credit assignment via reward shaping and options discovery.
Preliminary results indicate that the knowledge of Large Language Models is a promising prior for credit assignment in RL.
arXiv Detail & Related papers (2024-09-19T14:08:09Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems.
Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Mutual Enhancement of Large Language and Reinforcement Learning Models
through Bi-Directional Feedback Mechanisms: A Case Study [1.3597551064547502]
We employ a teacher-student learning framework to tackle problems of Large Language Models (LLMs) and reinforcement learning (RL) models.
Within this framework, the LLM acts as a teacher, while the RL model acts as a student.
We propose a practical algorithm to address the problem and conduct empirical experiments to evaluate the effectiveness of our method.
arXiv Detail & Related papers (2024-01-12T14:35:57Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.