The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement
Learning and Large Language Models
- URL: http://arxiv.org/abs/2402.01874v1
- Date: Fri, 2 Feb 2024 20:01:15 GMT
- Title: The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement
Learning and Large Language Models
- Authors: Moschoula Pternea, Prerna Singh, Abir Chakraborty, Yagna Oruganti,
Mirco Milletari, Sayli Bapat, Kebei Jiang
- Abstract summary: We review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs)
We propose a novel taxonomy of three main classes based on the way that the two model types interact with each other.
- Score: 2.5721733711031978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we review research studies that combine Reinforcement Learning
(RL) and Large Language Models (LLMs), two areas that owe their momentum to the
development of deep neural networks. We propose a novel taxonomy of three main
classes based on the way that the two model types interact with each other. The
first class, RL4LLM, includes studies where RL is leveraged to improve the
performance of LLMs on tasks related to Natural Language Processing. L4LLM is
divided into two sub-categories depending on whether RL is used to directly
fine-tune an existing LLM or to improve the prompt of the LLM. In the second
class, LLM4RL, an LLM assists the training of an RL model that performs a task
that is not inherently related to natural language. We further break down
LLM4RL based on the component of the RL training framework that the LLM assists
or replaces, namely reward shaping, goal generation, and policy function.
Finally, in the third class, RL+LLM, an LLM and an RL agent are embedded in a
common planning framework without either of them contributing to training or
fine-tuning of the other. We further branch this class to distinguish between
studies with and without natural language feedback. We use this taxonomy to
explore the motivations behind the synergy of LLMs and RL and explain the
reasons for its success, while pinpointing potential shortcomings and areas
where further research is needed, as well as alternative methodologies that
serve the same goal.
Related papers
- From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions [8.55917897789612]
We focus on the cooperative tasks of multiple agents with a common goal and communication among them.
We also consider human-in/on-the-loop scenarios enabled by the language component in the framework.
arXiv Detail & Related papers (2024-05-17T22:10:23Z) - Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods [18.771658054884693]
Large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and high-level task planning.
We propose a structured taxonomy to systematically categorize LLMs' functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator.
arXiv Detail & Related papers (2024-03-30T08:28:08Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling [20.022332182475672]
ARL2 is a retriever learning technique that harnesses large language models as labelers.
ARL2 uses an adaptive self-training strategy for curating high-quality and diverse relevance data.
Experiments demonstrate the effectiveness of ARL2, achieving accuracy improvements of 5.4% on NQ and 4.6% on MMLU.
arXiv Detail & Related papers (2024-02-21T05:41:34Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - Mutual Enhancement of Large Language and Reinforcement Learning Models
through Bi-Directional Feedback Mechanisms: A Case Study [1.3597551064547502]
We employ a teacher-student learning framework to tackle problems of Large Language Models (LLMs) and reinforcement learning (RL) models.
Within this framework, the LLM acts as a teacher, while the RL model acts as a student.
We propose a practical algorithm to address the problem and conduct empirical experiments to evaluate the effectiveness of our method.
arXiv Detail & Related papers (2024-01-12T14:35:57Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - Is Reinforcement Learning (Not) for Natural Language Processing?:
Benchmarks, Baselines, and Building Blocks for Natural Language Policy
Optimization [73.74371798168642]
We introduce an open-source modular library, RL4LMs, for optimizing language generators with reinforcement learning.
Next, we present the GRUE benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions.
Finally, we introduce an easy-to-use, performant RL algorithm, NLPO, that learns to effectively reduce the action space in language generation.
arXiv Detail & Related papers (2022-10-03T21:38:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.