Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
- URL: http://arxiv.org/abs/2311.05584v1
- Date: Thu, 9 Nov 2023 18:45:16 GMT
- Title: Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
- Authors: Joey Hong and Sergey Levine and Anca Dragan
- Abstract summary: Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
- Score: 70.7884839812069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have emerged as powerful and general solutions
to many natural language tasks. However, many of the most important
applications of language generation are interactive, where an agent has to talk
to a person to reach a desired outcome. For example, a teacher might try to
understand their student's current comprehension level to tailor their
instruction accordingly, and a travel agent might ask questions of their
customer to understand their preferences in order to recommend activities they
might enjoy. LLMs trained with supervised fine-tuning or "single-step" RL, as
with standard RLHF, might struggle which tasks that require such goal-directed
behavior, since they are not trained to optimize for overall conversational
outcomes after multiple turns of interaction. In this work, we explore a new
method for adapting LLMs with RL for such goal-directed dialogue. Our key
insight is that, though LLMs might not effectively solve goal-directed dialogue
tasks out of the box, they can provide useful data for solving such tasks by
simulating suboptimal but human-like behaviors. Given a textual description of
a goal-directed dialogue task, we leverage LLMs to sample diverse synthetic
rollouts of hypothetical in-domain human-human interactions. Our algorithm then
utilizes this dataset with offline reinforcement learning to train an
interactive conversational agent that can optimize goal-directed objectives
over multiple turns. In effect, the LLM produces examples of possible
interactions, and RL then processes these examples to learn to perform more
optimal interactions. Empirically, we show that our proposed approach achieves
state-of-the-art performance in various goal-directed dialogue tasks that
include teaching and preference elicitation.
Related papers
- Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion.
We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations.
Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z) - Goal Inference from Open-Ended Dialog [6.21910767424247]
We present an online method for embodied agents to learn and accomplish diverse user goals.
We extract natural language goal representations from conversations with Large Language Models.
As a result, our method can represent uncertainty over complex goals based on unrestricted dialog.
arXiv Detail & Related papers (2024-10-17T18:30:52Z) - Reasoning in Conversation: Solving Subjective Tasks through Dialogue
Simulation for Large Language Models [56.93074140619464]
We propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation.
The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales.
We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks.
arXiv Detail & Related papers (2024-02-27T05:37:10Z) - Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk [11.706292228586332]
Large language models (LLMs) are powerful dialogue agents, but specializing them towards fulfilling a specific function can be challenging.
We propose a more effective method for data collection through LLMs engaging in a conversation in various roles.
This approach generates a training data via "self-talk" of LLMs that can be refined and utilized for supervised fine-tuning.
arXiv Detail & Related papers (2024-01-10T09:49:10Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - Frugal Prompting for Dialog Models [17.048111072193933]
This study examines different approaches for building dialog systems using large language models (LLMs)
As part of prompt tuning, we experiment with various ways of providing instructions, exemplars, current query and additional context.
The research also analyzes the representations of dialog history that have the optimal usable-information density.
arXiv Detail & Related papers (2023-05-24T09:06:49Z) - A Mixture-of-Expert Approach to RL-based Dialogue Management [56.08449336469477]
We use reinforcement learning to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction.
Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with aly complex action space even for a medium-size vocabulary.
We develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of specialized LMs (or experts) capable of generating utterances corresponding to a
arXiv Detail & Related papers (2022-05-31T19:00:41Z) - CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement
Learning [85.3987745097806]
offline reinforcement learning can be used to train dialogue agents entirely using static datasets collected from human speakers.
Experiments show that recently developed offline RL methods can be combined with language models to yield realistic dialogue agents.
arXiv Detail & Related papers (2022-04-18T17:43:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.