Replicating Complex Dialogue Policy of Humans via Offline Imitation
Learning with Supervised Regularization
- URL: http://arxiv.org/abs/2305.03987v1
- Date: Sat, 6 May 2023 09:27:58 GMT
- Title: Replicating Complex Dialogue Policy of Humans via Offline Imitation
Learning with Supervised Regularization
- Authors: Zhoujian Sun, Chenyang Zhao, Zhengxing Huang, Nai Ding
- Abstract summary: Policy learning (PL) is a module of a task-oriented dialogue system that trains an agent to make actions in each dialogue turn.
Both supervised learning (SL) and reinforcement learning (RL) frameworks cannot imitate humans well.
This study proposed an offline imitation learning model that learns policy from real dialogue datasets.
- Score: 7.151589223349882
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Policy learning (PL) is a module of a task-oriented dialogue system that
trains an agent to make actions in each dialogue turn. Imitating human action
is a fundamental problem of PL. However, both supervised learning (SL) and
reinforcement learning (RL) frameworks cannot imitate humans well. Training RL
models require online interactions with user simulators, while simulating
complex human policy is hard. Performances of SL-based models are restricted
because of the covariate shift problem. Specifically, a dialogue is a
sequential decision-making process where slight differences in current
utterances and actions will cause significant differences in subsequent
utterances. Therefore, the generalize ability of SL models is restricted
because statistical characteristics of training and testing dialogue data
gradually become different. This study proposed an offline imitation learning
model that learns policy from real dialogue datasets and does not require user
simulators. It also utilizes state transition information, which alleviates the
influence of the covariate shift problem. We introduced a regularization trick
to make our model can be effectively optimized. We investigated the performance
of our model on four independent public dialogue datasets. The experimental
result showed that our model performed better in the action prediction task.
Related papers
- Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training.
Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z) - Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner [51.77263363285369]
We present an approach called Dialogue Action Tokens that adapts language model agents to plan goal-directed dialogues.
The core idea is to treat each utterance as an action, thereby converting dialogues into games where existing approaches such as reinforcement learning can be applied.
arXiv Detail & Related papers (2024-06-17T18:01:32Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - Finetuning Offline World Models in the Real World [13.46766121896684]
Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult.
offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction.
In this work, we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.
arXiv Detail & Related papers (2023-10-24T17:46:12Z) - Aligning Language Models with Offline Learning from Human Feedback [5.539080592071948]
We propose an offline learning from human feedback framework to align language models without interacting with environments.
Specifically, we explore filtering alignment (FA), reward-weighted regression (RWR), and conditional alignment (CA) to align language models to human preferences.
arXiv Detail & Related papers (2023-08-23T10:41:07Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z) - CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement
Learning [85.3987745097806]
offline reinforcement learning can be used to train dialogue agents entirely using static datasets collected from human speakers.
Experiments show that recently developed offline RL methods can be combined with language models to yield realistic dialogue agents.
arXiv Detail & Related papers (2022-04-18T17:43:21Z) - Adaptive Dialog Policy Learning with Hindsight and User Modeling [10.088347529930129]
We develop algorithm LHUA that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users.
Experimental results suggest that, in success rate and policy quality, LHUA outperforms competitive baselines from the literature.
arXiv Detail & Related papers (2020-05-07T07:43:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.