LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying
- URL: http://arxiv.org/abs/2308.13542v1
- Date: Mon, 21 Aug 2023 02:07:35 GMT
- Title: LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying
- Authors: Thommen George Karimpanal, Laknath Buddhika Semage, Santu Rana, Hung
Le, Truyen Tran, Sunil Gupta and Svetha Venkatesh
- Abstract summary: Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
- Score: 71.86163159193327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have recently demonstrated their impressive
ability to provide context-aware responses via text. This ability could
potentially be used to predict plausible solutions in sequential decision
making tasks pertaining to pattern completion. For example, by observing a
partial stack of cubes, LLMs can predict the correct sequence in which the
remaining cubes should be stacked by extrapolating the observed patterns (e.g.,
cube sizes, colors or other attributes) in the partial stack. In this work, we
introduce LaGR (Language-Guided Reinforcement learning), which uses this
predictive ability of LLMs to propose solutions to tasks that have been
partially completed by a primary reinforcement learning (RL) agent, in order to
subsequently guide the latter's training. However, as RL training is generally
not sample-efficient, deploying this approach would inherently imply that the
LLM be repeatedly queried for solutions; a process that can be expensive and
infeasible. To address this issue, we introduce SEQ (sample efficient
querying), where we simultaneously train a secondary RL agent to decide when
the LLM should be queried for solutions. Specifically, we use the quality of
the solutions emanating from the LLM as the reward to train this agent. We show
that our proposed framework LaGR-SEQ enables more efficient primary RL
training, while simultaneously minimizing the number of queries to the LLM. We
demonstrate our approach on a series of tasks and highlight the advantages of
our approach, along with its limitations and potential future research
directions.
Related papers
- Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning [28.077228879886402]
Reinforcement Learning (RL) suffers from sample inefficiency in reward domains, and the problem is further pronounced in case of transitions.
To improve the sample efficiency, reward shaping is a well-studied approach to introduce intrinsic rewards that can help the RL agent converge to an optimal policy faster.
arXiv Detail & Related papers (2024-05-24T03:53:57Z) - Reinforcement Learning Problem Solving with Large Language Models [0.0]
Large Language Models (LLMs) have an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of Natural Language Processing (NLP) tasks.
This has also facilitated a more accessible paradigm of conversation-based interactions between humans and AI systems to solve intended problems.
We show the practicality of our approach through two detailed case studies for "Research Scientist" and "Legal Matter Intake"
arXiv Detail & Related papers (2024-04-29T12:16:08Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Language Reward Modulation for Pretraining Reinforcement Learning [61.76572261146311]
We propose leveraging the capabilities of LRFs as a pretraining signal for reinforcement learning.
Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks.
arXiv Detail & Related papers (2023-08-23T17:37:51Z) - Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM
Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks.
In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.