Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
- URL: http://arxiv.org/abs/2302.10850v2
- Date: Sun, 29 Oct 2023 13:05:52 GMT
- Title: Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
- Authors: Dhawal Gupta, Yinlam Chow, Aza Tulepbergenov, Mohammad Ghavamzadeh,
Craig Boutilier
- Abstract summary: Reinforcement learning (RL) has shown great promise for developing dialogue management (DM) agents that are non-myopic.
We develop a variety of RL algorithms, specialized to dialogue planning, that leverage recent Mixture-of-Expert Language Models (MoE-LMs)
By exploiting MoE-LM structure, our methods significantly reduce the size of the action space and improve the efficacy of RL-based DM.
- Score: 36.254564021059515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has shown great promise for developing dialogue
management (DM) agents that are non-myopic, conduct rich conversations, and
maximize overall user satisfaction. Despite recent developments in RL and
language models (LMs), using RL to power conversational chatbots remains
challenging, in part because RL requires online exploration to learn
effectively, whereas collecting novel human-bot interactions can be expensive
and unsafe. This issue is exacerbated by the combinatorial action spaces facing
these algorithms, as most LM agents generate responses at the word level. We
develop a variety of RL algorithms, specialized to dialogue planning, that
leverage recent Mixture-of-Expert Language Models (MoE-LMs) -- models that
capture diverse semantics, generate utterances reflecting different intents,
and are amenable for multi-turn DM. By exploiting MoE-LM structure, our methods
significantly reduce the size of the action space and improve the efficacy of
RL-based DM. We evaluate our methods in open-domain dialogue to demonstrate
their effectiveness w.r.t.\ the diversity of intent in generated utterances and
overall DM performance.
Related papers
- Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent)
It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.
The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - A Mixture-of-Expert Approach to RL-based Dialogue Management [56.08449336469477]
We use reinforcement learning to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction.
Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with aly complex action space even for a medium-size vocabulary.
We develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of specialized LMs (or experts) capable of generating utterances corresponding to a
arXiv Detail & Related papers (2022-05-31T19:00:41Z) - CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement
Learning [85.3987745097806]
offline reinforcement learning can be used to train dialogue agents entirely using static datasets collected from human speakers.
Experiments show that recently developed offline RL methods can be combined with language models to yield realistic dialogue agents.
arXiv Detail & Related papers (2022-04-18T17:43:21Z) - High-Quality Diversification for Task-Oriented Dialogue Systems [18.455916009255485]
Training DRL agents with diverse dialogue trajectories prepare them well for rare user requests and unseen situations.
One effective diversification method is to let the agent interact with a diverse set of learned user models.
We propose a novel dialogue diversification method for task-oriented dialogue systems trained in simulators.
arXiv Detail & Related papers (2021-06-02T02:10:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.