CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue
Dataset
- URL: http://arxiv.org/abs/2002.11893v2
- Date: Fri, 28 Feb 2020 06:04:14 GMT
- Title: CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue
Dataset
- Authors: Qi Zhu, Kaili Huang, Zheng Zhang, Xiaoyan Zhu, Minlie Huang
- Abstract summary: CrossWOZ is the first large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset.
It contains 6K dialogue sessions and 102K utterances for 5 domains, including hotel, restaurant, attraction, metro, and taxi.
- Score: 58.910961297314415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To advance multi-domain (cross-domain) dialogue modeling as well as alleviate
the shortage of Chinese task-oriented datasets, we propose CrossWOZ, the first
large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset. It
contains 6K dialogue sessions and 102K utterances for 5 domains, including
hotel, restaurant, attraction, metro, and taxi. Moreover, the corpus contains
rich annotation of dialogue states and dialogue acts at both user and system
sides. About 60% of the dialogues have cross-domain user goals that favor
inter-domain dependency and encourage natural transition across domains in
conversation. We also provide a user simulator and several benchmark models for
pipelined task-oriented dialogue systems, which will facilitate researchers to
compare and evaluate their models on this corpus. The large size and rich
annotation of CrossWOZ make it suitable to investigate a variety of tasks in
cross-domain dialogue modeling, such as dialogue state tracking, policy
learning, user simulation, etc.
Related papers
- Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective.
We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way.
We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z) - SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation [55.82577086422923]
We provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues.
We release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues.
We also provide a benchmark including 18 models across five categories for the dialogue segmentation task.
arXiv Detail & Related papers (2023-05-15T06:08:01Z) - Manual-Guided Dialogue for Flexible Conversational Agents [84.46598430403886]
How to build and use dialogue data efficiently, and how to deploy models in different domains at scale can be critical issues in building a task-oriented dialogue system.
We propose a novel manual-guided dialogue scheme, where the agent learns the tasks from both dialogue and manuals.
Our proposed scheme reduces the dependence of dialogue models on fine-grained domain ontology, and makes them more flexible to adapt to various domains.
arXiv Detail & Related papers (2022-08-16T08:21:12Z) - A Slot Is Not Built in One Utterance: Spoken Language Dialogs with
Sub-Slots [67.69407159704328]
This paper defines a new task named Sub-Slot based Task-Oriented Dialog (SSTOD)
The dataset includes a total of 40K dialogs and 500K utterances from four different domains: Chinese names, phone numbers, ID numbers and license plate numbers.
We find some new linguistic phenomena and interactive manners in SSTOD which raise critical challenges of building dialog agents for the task.
arXiv Detail & Related papers (2022-03-21T07:10:19Z) - What Did You Say? Task-Oriented Dialog Datasets Are Not Conversational!? [4.022057598291766]
We outline a taxonomy of conversational and contextual effects, which we use to examine MultiWOZ, SGD and SMCalFlow.
We find that less than 4% of MultiWOZ's turns and 10% of SGD's turns are conversational, while SMCalFlow is not conversational at all in its current release.
arXiv Detail & Related papers (2022-03-07T14:26:23Z) - RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich
Semantic Annotations for Task-Oriented Dialogue Modeling [35.75880078666584]
RiSAWOZ is a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic s.
It contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains.
arXiv Detail & Related papers (2020-10-17T08:18:59Z) - UniConv: A Unified Conversational Neural Architecture for Multi-domain
Task-oriented Dialogues [101.96097419995556]
"UniConv" is a novel unified neural architecture for end-to-end conversational systems in task-oriented dialogues.
We conduct comprehensive experiments in dialogue state tracking, context-to-text, and end-to-end settings on the MultiWOZ2.1 benchmark.
arXiv Detail & Related papers (2020-04-29T16:28:22Z) - Interview: A Large-Scale Open-Source Corpus of Media Dialog [11.28504775964698]
We introduce 'Interview': a large-scale (105K conversations) media dialog dataset collected from news interview transcripts.
Compared to existing large-scale proxies for conversational data, language models trained on our dataset exhibit better zero-shot out-of-domain performance.
'Interview' contains speaker role annotations for each turn, facilitating the development of engaging, responsive dialog systems.
arXiv Detail & Related papers (2020-04-07T02:44:50Z) - MA-DST: Multi-Attention Based Scalable Dialog State Tracking [13.358314140896937]
Dialog State Tracking dialog agents provide a natural language interface for users to complete their goal.
To enable accurate multi-domain DST, the model needs to encode dependencies between past utterances and slot semantics.
We introduce a novel architecture for this task to encode the conversation history and slot semantics.
arXiv Detail & Related papers (2020-02-07T05:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.