LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models
- URL: http://arxiv.org/abs/2311.18232v1
- Date: Thu, 30 Nov 2023 03:59:31 GMT
- Title: LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models
- Authors: Marwa Abdulhai and Isadora White and Charlie Snell and Charles Sun and
Joey Hong and Yuexiang Zhai and Kelvin Xu and Sergey Levine
- Abstract summary: This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
- Score: 56.25156596019168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) provide excellent text-generation capabilities,
but standard prompting and generation methods generally do not lead to
intentional or goal-directed agents and might necessitate considerable prompt
tuning. This becomes particularly apparent in multi-turn conversations: even
the best current LLMs rarely ask clarifying questions, engage in explicit
information gathering, or take actions now that lead to better decisions after
multiple turns. Reinforcement learning has the potential to leverage the
powerful modeling capabilities of LLMs, as well as their internal
representation of textual interactions, to create capable goal-directed
language agents. This can enable intentional and temporally extended
interactions, such as with humans, through coordinated persuasion and carefully
crafted questions, or in goal-directed play through text games to bring about
desired final outcomes. However, enabling this requires the community to
develop stable and reliable reinforcement learning algorithms that can
effectively train LLMs. Developing such algorithms requires tasks that can
gauge progress on algorithm design, provide accessible and reproducible
evaluations for multi-turn interactions, and cover a range of task properties
and challenges in improving reinforcement learning algorithms. Our paper
introduces the LMRL-Gym benchmark for evaluating multi-turn RL for LLMs,
together with an open-source research framework containing a basic toolkit for
getting started on multi-turn RL with offline value-based and policy-based RL
methods. Our benchmark consists of 8 different language tasks, which require
multiple rounds of language interaction and cover a range of tasks in
open-ended dialogue and text games.
Related papers
- RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Reinforcement Learning Problem Solving with Large Language Models [0.0]
Large Language Models (LLMs) have an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of Natural Language Processing (NLP) tasks.
This has also facilitated a more accessible paradigm of conversation-based interactions between humans and AI systems to solve intended problems.
We show the practicality of our approach through two detailed case studies for "Research Scientist" and "Legal Matter Intake"
arXiv Detail & Related papers (2024-04-29T12:16:08Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - On the Performance of Multimodal Language Models [4.677125897916577]
This study conducts a comparative analysis of different multimodal instruction tuning approaches.
We reveal key insights for guiding architectural choices when incorporating multimodal capabilities into large language models.
arXiv Detail & Related papers (2023-10-04T23:33:36Z) - MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
Feedback [78.60644407028022]
We introduce MINT, a benchmark that evaluates large language models' ability to solve tasks with multi-turn interactions.
LLMs generally benefit from tools and language feedback, with performance gains of 1-8% for each turn of tool use.
LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities.
arXiv Detail & Related papers (2023-09-19T15:25:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.