Efficient Reinforcement Learning with Large Language Model Priors
- URL: http://arxiv.org/abs/2410.07927v1
- Date: Thu, 10 Oct 2024 13:54:11 GMT
- Title: Efficient Reinforcement Learning with Large Language Model Priors
- Authors: Xue Yan, Yan Song, Xidong Feng, Mengyue Yang, Haifeng Zhang, Haitham Bou Ammar, Jun Wang,
- Abstract summary: Large language models (LLMs) have recently emerged as powerful general-purpose tools.
We propose treating LLMs as prior action distributions and integrating them into RL frameworks.
We show that incorporating LLM-based action priors significantly reduces exploration and complexity optimization.
- Score: 18.72288751305885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In sequential decision-making (SDM) tasks, methods like reinforcement learning (RL) and heuristic search have made notable advances in specific cases. However, they often require extensive exploration and face challenges in generalizing across diverse environments due to their limited grasp of the underlying decision dynamics. In contrast, large language models (LLMs) have recently emerged as powerful general-purpose tools, due to their capacity to maintain vast amounts of domain-specific knowledge. To harness this rich prior knowledge for efficiently solving complex SDM tasks, we propose treating LLMs as prior action distributions and integrating them into RL frameworks through Bayesian inference methods, making use of variational inference and direct posterior sampling. The proposed approaches facilitate the seamless incorporation of fixed LLM priors into both policy-based and value-based RL frameworks. Our experiments show that incorporating LLM-based action priors significantly reduces exploration and optimization complexity, substantially improving sample efficiency compared to traditional RL techniques, e.g., using LLM priors decreases the number of required samples by over 90% in offline learning scenarios.
Related papers
- Toward Efficient Exploration by Large Language Model Agents [14.712532175418884]
Large language models (LLMs) can be used to explicitly implement an existing reinforcement learning algorithm.
We show how our LLM-based implementation of a known, data-efficient RL algorithm can be considerably more effective in natural language tasks.
arXiv Detail & Related papers (2025-04-29T17:59:48Z) - Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models [53.4530106173067]
Large language models (LLMs) with reinforcement learning (RL) have shown promising improvements in complex reasoning tasks.
RL remains challenging for tiny LLMs with 1 billion parameters or fewer because they lack the necessary pretraining strength to explore effectively.
This work introduces a novel intrinsic motivation approach that leverages episodic memory to address this challenge.
arXiv Detail & Related papers (2025-04-03T04:46:17Z) - Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning [16.654435148168172]
Large Language Models (LLMs) have shown remarkable promise in reasoning and decision-making.
We propose an LLM-guided hierarchical RL framework, termed LDSC, to enhance sample efficiency, generalization, and multi-task adaptability.
arXiv Detail & Related papers (2025-03-24T15:49:56Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.
Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.
Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs [27.014415210732103]
We introduce textbfLanguage textbfModel textbfGuided textbfTrade-offs (i.e., textbfLMGT), a novel, sample-efficient framework for Reinforcement Learning.
arXiv Detail & Related papers (2024-09-07T07:40:43Z) - Empowering Few-Shot Relation Extraction with The Integration of Traditional RE Methods and Large Language Models [48.846159555253834]
Few-Shot Relation Extraction (FSRE) appeals to more researchers in Natural Language Processing (NLP)
Recent emergence of Large Language Models (LLMs) has prompted numerous researchers to explore FSRE through In-Context Learning (ICL)
arXiv Detail & Related papers (2024-07-12T03:31:11Z) - Efficient Sequential Decision Making with Large Language Models [19.083642464977224]
This paper focuses on extending the success of large language models (LLMs) to sequential decision making.
Existing efforts either (i) re-train or finetune LLMs for decision making, or (ii) design prompts for pretrained LLMs.
We propose a new approach that leverages online model selection algorithms to efficiently incorporate LLMs agents into sequential decision making.
arXiv Detail & Related papers (2024-06-17T22:13:22Z) - Improve Temporal Awareness of LLMs for Sequential Recommendation [61.723928508200196]
Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks.
LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data.
We propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation.
arXiv Detail & Related papers (2024-05-05T00:21:26Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.