Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents
- URL: http://arxiv.org/abs/2311.13373v6
- Date: Mon, 27 May 2024 14:00:23 GMT
- Title: Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents
- Authors: Zihao Zhou, Bin Hu, Chenyang Zhao, Pu Zhang, Bin Liu,
- Abstract summary: Large Language Models (LLMs) can address sequential decision-making tasks through the provision of high-level instructions.
LLMs lack specialization in tackling specific target problems, particularly in real-time dynamic environments.
We introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent.
- Score: 16.24662355253529
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling specific target problems, particularly in real-time dynamic environments. Additionally, deploying an LLM-based agent in practical scenarios can be both costly and time-consuming. On the other hand, reinforcement learning (RL) approaches train agents that specialize in the target task but often suffer from low sampling efficiency and high exploration costs. In this paper, we introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent. By incorporating the guidance from the teacher agent, the student agent can distill the prior knowledge of the LLM into its own model. Consequently, the student agent can be trained with significantly less data. Moreover, through further training with environment feedback, the student agent surpasses the capabilities of its teacher for completing the target task. We conducted experiments on challenging MiniGrid and Habitat environments, specifically designed for embodied AI research, to evaluate the effectiveness of our framework. The results clearly demonstrate that our approach achieves superior performance compared to strong baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [64.13803241218886]
We present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems.
Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles.
We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - Training Agents with Weakly Supervised Feedback from Large Language Models [19.216542820742607]
This paper introduces a novel training method for LLM-based agents using weakly supervised signals from a critic LLM.
Our agents are trained in iterative manner, where they initially generate trajectories through environmental interaction.
Tests on the API-bank dataset show consistent improvement in our agents' capabilities and comparable performance to GPT-4.
arXiv Detail & Related papers (2024-11-29T08:47:04Z) - Words as Beacons: Guiding RL Agents with High-Level Language Prompts [6.7236795813629]
Large Language Models (LLMs) as "teachers" guide the agent's learning process by decomposing complex tasks into subgoals.
LLMs can provide subgoals to accomplish the task defined for the environment in a similar fashion to how a human would do.
It is possible to query the LLM only during the training phase, enabling agents to operate within the environment without any LLM intervention.
arXiv Detail & Related papers (2024-10-11T08:54:45Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning [12.651588927599441]
Instruction tuning aims to align large language models with open-domain instructions and human-preferred responses.
We introduce Task-Aware Curriculum Planning for Instruction Refinement (TAPIR) to select instructions that are difficult for a student LLM to follow.
To balance the student's capabilities, task distributions in training sets are adjusted with responses automatically refined according to their corresponding tasks.
arXiv Detail & Related papers (2024-05-22T08:38:26Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments.
We train a small RL agent in a mixture of the original and LLM-generated environments.
We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z) - AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks.
We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z) - Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach [31.6589518077397]
Large language models (LLMs) encode a vast amount of world knowledge acquired from massive text datasets.
LLMs can assist an embodied agent in solving complex sequential decision making tasks by providing high-level instructions.
We propose When2Ask, a reinforcement learning based approach that learns when it is necessary to query LLMs for high-level instructions.
arXiv Detail & Related papers (2023-06-06T11:49:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.