Related papers: Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

URL: http://arxiv.org/abs/2311.05584v1
Date: Thu, 9 Nov 2023 18:45:16 GMT
Title: Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Authors: Joey Hong and Sergey Levine and Anca Dragan
Abstract summary: Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
Score: 70.7884839812069
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. For example, a teacher might try to understand their student's current comprehension level to tailor their instruction accordingly, and a travel agent might ask questions of their customer to understand their preferences in order to recommend activities they might enjoy. LLMs trained with supervised fine-tuning or "single-step" RL, as with standard RLHF, might struggle which tasks that require such goal-directed behavior, since they are not trained to optimize for overall conversational outcomes after multiple turns of interaction. In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue. Our key insight is that, though LLMs might not effectively solve goal-directed dialogue tasks out of the box, they can provide useful data for solving such tasks by simulating suboptimal but human-like behaviors. Given a textual description of a goal-directed dialogue task, we leverage LLMs to sample diverse synthetic rollouts of hypothetical in-domain human-human interactions. Our algorithm then utilizes this dataset with offline reinforcement learning to train an interactive conversational agent that can optimize goal-directed objectives over multiple turns. In effect, the LLM produces examples of possible interactions, and RL then processes these examples to learn to perform more optimal interactions. Empirically, we show that our proposed approach achieves state-of-the-art performance in various goal-directed dialogue tasks that include teaching and preference elicitation.

Related papers

Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL [62.984693936073974]
Large language models (LLMs) excel in tasks like question answering and dialogue.<n>Complex tasks requiring interaction, such as negotiation and persuasion, require additional long-horizon reasoning and planning.<n>We propose a novel approach that uses goal-conditioned value functions to guide the reasoning of LLM agents.
arXiv Detail & Related papers (2025-05-23T16:51:54Z)
Towards Anthropomorphic Conversational AI Part I: A Practical Framework [49.62013440962072]
We introduce a multi- module framework designed to replicate the key aspects of human intelligence involved in conversations. In the second stage of our approach, these conversational data, after filtering and labeling, can serve as training and testing data for reinforcement learning.
arXiv Detail & Related papers (2025-02-28T03:18:39Z)
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion. We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations. Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z)
Goal Inference from Open-Ended Dialog [6.21910767424247]
We present an online method for embodied agents to learn and accomplish diverse user goals. We extract natural language goal representations from conversations with Large Language Models. As a result, our method can represent uncertainty over complex goals based on unrestricted dialog.
arXiv Detail & Related papers (2024-10-17T18:30:52Z)
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training [33.57497419019826]
Action-Based Contrastive Self-Training enables data-efficient dialogue policy learning in multi-turn conversation modeling.<n>We demonstrate ACT's efficacy under in data-efficient tuning scenarios, even when there is no action label available.<n>We also propose evaluating LLMs' ability to function as conversational agents by examining whether they can implicitly recognize and reason about ambiguity in conversation.
arXiv Detail & Related papers (2024-05-31T22:44:48Z)
Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models [56.93074140619464]
We propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation. The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales. We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks.
arXiv Detail & Related papers (2024-02-27T05:37:10Z)
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk [11.706292228586332]
Large language models (LLMs) are powerful dialogue agents, but specializing them towards fulfilling a specific function can be challenging. We propose a more effective method for data collection through LLMs engaging in a conversation in various roles. This approach generates a training data via "self-talk" of LLMs that can be refined and utilized for supervised fine-tuning.
arXiv Detail & Related papers (2024-01-10T09:49:10Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
Frugal Prompting for Dialog Models [17.048111072193933]
This study examines different approaches for building dialog systems using large language models (LLMs) As part of prompt tuning, we experiment with various ways of providing instructions, exemplars, current query and additional context. The research also analyzes the representations of dialog history that have the optimal usable-information density.
arXiv Detail & Related papers (2023-05-24T09:06:49Z)
A Mixture-of-Expert Approach to RL-based Dialogue Management [56.08449336469477]
We use reinforcement learning to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction. Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with aly complex action space even for a medium-size vocabulary. We develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of specialized LMs (or experts) capable of generating utterances corresponding to a
arXiv Detail & Related papers (2022-05-31T19:00:41Z)
CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning [85.3987745097806]
offline reinforcement learning can be used to train dialogue agents entirely using static datasets collected from human speakers. Experiments show that recently developed offline RL methods can be combined with language models to yield realistic dialogue agents.
arXiv Detail & Related papers (2022-04-18T17:43:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.