Are LLMs All You Need for Task-Oriented Dialogue?
- URL: http://arxiv.org/abs/2304.06556v2
- Date: Thu, 3 Aug 2023 15:31:50 GMT
- Title: Are LLMs All You Need for Task-Oriented Dialogue?
- Authors: Vojt\v{e}ch Hude\v{c}ek and Ond\v{r}ej Du\v{s}ek
- Abstract summary: Instructions-tuned Large Language Models (LLMs) gained recently huge popularity thanks to their ability to interact with users through conversation.
In this work we aim to evaluate their ability to complete multi-turn tasks and interact with external databases in the context of established task-oriented dialogue benchmarks.
- Score: 3.42658286826597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instructions-tuned Large Language Models (LLMs) gained recently huge
popularity thanks to their ability to interact with users through conversation.
In this work we aim to evaluate their ability to complete multi-turn tasks and
interact with external databases in the context of established task-oriented
dialogue benchmarks. We show that for explicit belief state tracking, LLMs
underperform compared to specialized task-specific models. Nevertheless, they
show ability to guide the dialogue to successful ending if given correct slot
values. Furthermore this ability improves with access to true belief state
distribution or in-domain examples.
Related papers
- MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions [58.57255822646756]
This paper introduces MathChat, a benchmark designed to evaluate large language models (LLMs) across a broader spectrum of mathematical tasks.
We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios.
We develop MathChat sync, a synthetic dialogue based math dataset for LLM finetuning, focusing on improving models' interaction and instruction following capabilities in conversations.
arXiv Detail & Related papers (2024-05-29T18:45:55Z) - Reasoning in Conversation: Solving Subjective Tasks through Dialogue
Simulation for Large Language Models [56.93074140619464]
We propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation.
The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales.
We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks.
arXiv Detail & Related papers (2024-02-27T05:37:10Z) - Are LLMs Robust for Spoken Dialogues? [10.855403629160921]
Large Pre-Trained Language Models have demonstrated state-of-the-art performance in different downstream tasks.
Most of the publicly available datasets and benchmarks on task-oriented dialogues focus on written conversations.
We have evaluated the performance of LLMs for spoken task-oriented dialogues on the DSTC11 test sets.
arXiv Detail & Related papers (2024-01-04T14:36:38Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues [72.65163468440434]
This report provides a preliminary evaluation of existing large language models for human-style multi-turn chatting.
We prompt large language models (LLMs) to generate a full multi-turn dialogue based on the ChatSEED, utterance by utterance.
We find GPT-4 can generate human-style multi-turn dialogues with impressive quality, significantly outperforms its counterparts.
arXiv Detail & Related papers (2023-10-20T16:53:51Z) - Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue
Questions with LLMs [59.74002011562726]
We propose a novel linguistic cue-based chain-of-thoughts (textitCue-CoT) to provide a more personalized and engaging response.
We build a benchmark with in-depth dialogue questions, consisting of 6 datasets in both Chinese and English.
Empirical results demonstrate our proposed textitCue-CoT method outperforms standard prompting methods in terms of both textithelpfulness and textitacceptability on all datasets.
arXiv Detail & Related papers (2023-05-19T16:27:43Z) - Understanding the Effectiveness of Very Large Language Models on Dialog
Evaluation [20.18656308749408]
Large language models (LLMs) have been used for generation and can now output human-like text.
This paper investigates how the number of examples in the prompt and the type of example selection used affect the model's performance.
arXiv Detail & Related papers (2023-01-27T22:02:27Z) - DialogZoo: Large-Scale Dialog-Oriented Task Learning [52.18193690394549]
We aim to build a unified foundation model which can solve massive diverse dialogue tasks.
To achieve this goal, we first collect a large-scale well-labeled dialogue dataset from 73 publicly available datasets.
arXiv Detail & Related papers (2022-05-25T11:17:16Z) - The Interplay of Task Success and Dialogue Quality: An in-depth
Evaluation in Task-Oriented Visual Dialogues [6.02280861819024]
We show that in the popular end-to-end approach, this choice prevents the model from learning to generate linguistically richer dialogues.
We show that in GuessWhat, models could increase their accuracy if they learn to ground, encode, and decode also words that do not occur frequently in the training set.
arXiv Detail & Related papers (2021-03-20T10:13:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.