Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
- URL: http://arxiv.org/abs/2311.05584v1
- Date: Thu, 9 Nov 2023 18:45:16 GMT
- Title: Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
- Authors: Joey Hong and Sergey Levine and Anca Dragan
- Abstract summary: Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
- Score: 70.7884839812069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have emerged as powerful and general solutions
to many natural language tasks. However, many of the most important
applications of language generation are interactive, where an agent has to talk
to a person to reach a desired outcome. For example, a teacher might try to
understand their student's current comprehension level to tailor their
instruction accordingly, and a travel agent might ask questions of their
customer to understand their preferences in order to recommend activities they
might enjoy. LLMs trained with supervised fine-tuning or "single-step" RL, as
with standard RLHF, might struggle which tasks that require such goal-directed
behavior, since they are not trained to optimize for overall conversational
outcomes after multiple turns of interaction. In this work, we explore a new
method for adapting LLMs with RL for such goal-directed dialogue. Our key
insight is that, though LLMs might not effectively solve goal-directed dialogue
tasks out of the box, they can provide useful data for solving such tasks by
simulating suboptimal but human-like behaviors. Given a textual description of
a goal-directed dialogue task, we leverage LLMs to sample diverse synthetic
rollouts of hypothetical in-domain human-human interactions. Our algorithm then
utilizes this dataset with offline reinforcement learning to train an
interactive conversational agent that can optimize goal-directed objectives
over multiple turns. In effect, the LLM produces examples of possible
interactions, and RL then processes these examples to learn to perform more
optimal interactions. Empirically, we show that our proposed approach achieves
state-of-the-art performance in various goal-directed dialogue tasks that
include teaching and preference elicitation.
Related papers
- Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training [33.57497419019826]
Action-Based Contrastive Self-Training allows for sample-efficient dialogue policy learning in multi-turn conversation.
ACT demonstrates substantial conversation modeling improvements over standard approaches to supervised fine-tuning and DPO.
arXiv Detail & Related papers (2024-05-31T22:44:48Z) - Reasoning in Conversation: Solving Subjective Tasks through Dialogue
Simulation for Large Language Models [56.93074140619464]
We propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation.
The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales.
We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks.
arXiv Detail & Related papers (2024-02-27T05:37:10Z) - Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk [11.706292228586332]
Large language models (LLMs) are powerful dialogue agents, but specializing them towards fulfilling a specific function can be challenging.
We propose a more effective method for data collection through LLMs engaging in a conversation in various roles.
This approach generates a training data via "self-talk" of LLMs that can be refined and utilized for supervised fine-tuning.
arXiv Detail & Related papers (2024-01-10T09:49:10Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - Frugal Prompting for Dialog Models [17.048111072193933]
This study examines different approaches for building dialog systems using large language models (LLMs)
As part of prompt tuning, we experiment with various ways of providing instructions, exemplars, current query and additional context.
The research also analyzes the representations of dialog history that have the optimal usable-information density.
arXiv Detail & Related papers (2023-05-24T09:06:49Z) - A Mixture-of-Expert Approach to RL-based Dialogue Management [56.08449336469477]
We use reinforcement learning to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction.
Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with aly complex action space even for a medium-size vocabulary.
We develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of specialized LMs (or experts) capable of generating utterances corresponding to a
arXiv Detail & Related papers (2022-05-31T19:00:41Z) - CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement
Learning [85.3987745097806]
offline reinforcement learning can be used to train dialogue agents entirely using static datasets collected from human speakers.
Experiments show that recently developed offline RL methods can be combined with language models to yield realistic dialogue agents.
arXiv Detail & Related papers (2022-04-18T17:43:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.