Related papers: TicketTalk: Toward human-level performance with end-to-end, transaction-based dialog systems

TicketTalk: Toward human-level performance with end-to-end, transaction-based dialog systems

URL: http://arxiv.org/abs/2012.12458v2
Date: Sun, 27 Dec 2020 20:51:17 GMT
Title: TicketTalk: Toward human-level performance with end-to-end, transaction-based dialog systems
Authors: Bill Byrne, Karthik Krishnamoorthi, Saravanan Ganesh, Mihir Sanjay Kale
Abstract summary: We present a data-driven, end-to-end approach to transaction-based dialog systems. We show that the system performs at near-human levels in terms of verbal response quality and factual grounding accuracy. We introduce TicketTalk, a movie ticketing dialog dataset with 23,789 annotated conversations.
Score: 10.659519248703273
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We present a data-driven, end-to-end approach to transaction-based dialog systems that performs at near-human levels in terms of verbal response quality and factual grounding accuracy. We show that two essential components of the system produce these results: a sufficiently large and diverse, in-domain labeled dataset, and a neural network-based, pre-trained model that generates both verbal responses and API call predictions. In terms of data, we introduce TicketTalk, a movie ticketing dialog dataset with 23,789 annotated conversations. The movie ticketing conversations range from completely open-ended and unrestricted to more structured, both in terms of their knowledge base, discourse features, and number of turns. In qualitative human evaluations, model-generated responses trained on just 10,000 TicketTalk dialogs were rated to "make sense" 86.5 percent of the time, almost the same as human responses in the same contexts. Our simple, API-focused annotation schema results in a much easier labeling task making it faster and more cost effective. It is also the key component for being able to predict API calls accurately. We handle factual grounding by incorporating API calls in the training data, allowing our model to learn which actions to take and when. Trained on the same 10,000-dialog set, the model's API call predictions were rated to be correct 93.9 percent of the time in our evaluations, surpassing the ratings for the corresponding human labels. We show how API prediction and response generation scores improve as the dataset size incrementally increases from 5000 to 21,000 dialogs. Our analysis also clearly illustrates the benefits of pre-training. We are publicly releasing the TicketTalk dataset with this paper to facilitate future work on transaction-based dialogs.

Related papers

ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models [1.82618237315022]
We release ToolDial, a dataset comprising 11,111 multi-turn dialogues, with an average of 8.95 turns per dialogue, based on APIs from RapidAPI. We simulate dialogues where the system requests necessary information from the user based on API documentation and seeks additional APIs if the user fails to provide the required information. We evaluate a suite of language models on their ability to predict correct actions and extract input parameter values for API calls from the dialogue history.
arXiv Detail & Related papers (2025-03-01T17:23:51Z)
CoPrUS: Consistency Preserving Utterance Synthesis towards more realistic benchmark dialogues [0.27309692684728604]
We investigate the creation of synthetic communication errors in an automatic pipeline. We focus on three types of miscommunications that could happen in real-world dialogues but are underrepresented in the benchmark dataset. Our two-step approach uses a state-of-the-art Large Language Model (LLM) to first create the error and secondly the repairing utterance.
arXiv Detail & Related papers (2024-12-10T13:51:55Z)
PerSHOP -- A Persian dataset for shopping dialogue systems modeling [2.3025186469300434]
We developed a dataset of dialogues in the Persian language through crowd-sourcing. This dataset contains nearly 22k utterances in 15 different domains and 1061 dialogues. We proposed some baseline models for natural language understanding tasks.
arXiv Detail & Related papers (2024-01-01T16:42:56Z)
Weakly Supervised Data Augmentation Through Prompting for Dialogue Understanding [103.94325597273316]
We present a novel approach that iterates on augmentation quality by applying weakly-supervised filters. We evaluate our methods on the emotion and act classification tasks in DailyDialog and the intent classification task in Facebook Multilingual Task-Oriented Dialogue. For DailyDialog specifically, using 10% of the ground truth data we outperform the current state-of-the-art model which uses 100% of the data.
arXiv Detail & Related papers (2022-10-25T17:01:30Z)
Controllable Dialogue Simulation with In-Context Learning [39.04491297557292]
textscDialogic is a dialogue simulation method based on large language model in-context learning. Our method can rapidly expand a small set of dialogue data with minimum or zero human involvement. Our simulated dialogues have near-human fluency and annotation accuracy.
arXiv Detail & Related papers (2022-10-09T06:32:58Z)
SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding [68.94808536012371]
We propose a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora. Our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks.
arXiv Detail & Related papers (2022-09-14T13:42:50Z)
GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog. We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups. A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z)
Dialog Inpainting: Turning Documents into Dialogs [12.131506050808207]
We produce two datasets totalling 19 million diverse information-seeking dialogs. Human raters judge the answer adequacy and conversationality of WikiDialog to be as good or better than existing manually-collected datasets.
arXiv Detail & Related papers (2022-05-18T16:58:50Z)
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation [73.03318027164605]
We propose to use information that can be automatically extracted from the next user utterance as a proxy to measure the quality of the previous system response. Our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.
arXiv Detail & Related papers (2022-03-25T22:09:52Z)
Dialog Simulation with Realistic Variations for Training Goal-Oriented Conversational Systems [14.206866126142002]
Goal-oriented dialog systems enable users to complete specific goals like requesting information about a movie or booking a ticket. We propose an approach for automatically creating a large corpus of annotated dialogs from a few thoroughly annotated sample dialogs and the dialog schema. We achieve 18? 50% relative accuracy on a held-out test set compared to a baseline dialog generation approach.
arXiv Detail & Related papers (2020-11-16T19:39:15Z)
Pchatbot: A Large-Scale Dataset for Personalized Chatbot [49.16746174238548]
We introduce Pchatbot, a large-scale dialogue dataset that contains two subsets collected from Weibo and Judicial forums respectively. To adapt the raw dataset to dialogue systems, we elaborately normalize the raw dataset via processes such as anonymization. The scale of Pchatbot is significantly larger than existing Chinese datasets, which might benefit the data-driven models.
arXiv Detail & Related papers (2020-09-28T12:49:07Z)
A Large-Scale Chinese Short-Text Conversation Dataset [77.55813366932313]
We present a large-scale cleaned Chinese conversation dataset, LCCC, which contains a base version (6.8million dialogues) and a large version (12.0 million dialogues) The quality of our dataset is ensured by a rigorous data cleaning pipeline, which is built based on a set of rules. We also release pre-training dialogue models which are trained on LCCC-base and LCCC-large respectively.
arXiv Detail & Related papers (2020-08-10T08:12:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.