MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with
Essential Annotation Corrections to Improve State Tracking Evaluation
- URL: http://arxiv.org/abs/2104.00773v1
- Date: Thu, 1 Apr 2021 21:31:48 GMT
- Title: MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with
Essential Annotation Corrections to Improve State Tracking Evaluation
- Authors: Fanghua Ye, Jarana Manotumruksa, Emine Yilmaz
- Abstract summary: This work introduces MultiWOZ 2.4, in which we refine all annotations in the validation set and test set on top of MultiWOZ 2.1.
The annotations in the training set remain unchanged to encourage robust and noise-resilient model training.
We further benchmark 8 state-of-the-art dialogue state tracking models.
- Score: 22.642643471824076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The MultiWOZ 2.0 dataset was released in 2018. It consists of more than
10,000 task-oriented dialogues spanning 7 domains, and has greatly stimulated
the research of task-oriented dialogue systems. However, there is substantial
noise in the state annotations, which hinders a proper evaluation of dialogue
state tracking models. To tackle this issue, massive efforts have been devoted
to correcting the annotations, resulting in 3 improved versions of this dataset
(i.e., MultiWOZ 2.1-2.3). Even so, there are still lots of incorrect and
inconsistent annotations. This work introduces MultiWOZ 2.4, in which we refine
all annotations in the validation set and test set on top of MultiWOZ 2.1. The
annotations in the training set remain unchanged to encourage robust and
noise-resilient model training. We further benchmark 8 state-of-the-art
dialogue state tracking models. All these models achieve much higher
performance on MultiWOZ 2.4 than on MultiWOZ 2.1.
Related papers
- Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for
Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems [64.40789703661987]
Multi3WOZ is a novel multilingual, multi-domain, multi-parallel ToD dataset.
It is large-scale and offers culturally adapted dialogs in 4 languages.
We describe a complex bottom-up data collection process that yielded the final dataset.
arXiv Detail & Related papers (2023-07-26T08:29:42Z) - Which One Are You Referring To? Multimodal Object Identification in
Situated Dialogue [50.279206765971125]
We explore three methods to tackle the problem of interpreting multimodal inputs from conversational and situational contexts.
Our best method, scene-dialogue alignment, improves the performance by 20% F1-score compared to the SIMMC 2.1 baselines.
arXiv Detail & Related papers (2023-02-28T15:45:20Z) - XQA-DST: Multi-Domain and Multi-Lingual Dialogue State Tracking [23.945407948894967]
We propose a multi-domain and multi-lingual dialogue state tracker in a neural reading comprehension approach.
Our approach fills the slot values using span prediction, where the values are extracted from the dialogue itself.
We show its competitive transferability by zero-shot domain-adaptation experiments on MultiWOZ 2.1 with an average JGA of 31.6% for five domains.
arXiv Detail & Related papers (2022-04-12T15:45:32Z) - ASSIST: Towards Label Noise-Robust Dialogue State Tracking [19.742274632152366]
We propose ASSIST to train dialogue state tracking models robustly from noisy labels.
ASSIST improves the joint goal accuracy of DST by up to $28.16%$ on the initial version MultiWOZ 2.0 and $8.41%$ on the latest version MultiWOZ 2.4.
arXiv Detail & Related papers (2022-02-26T00:33:32Z) - Cross-lingual Intermediate Fine-tuning improves Dialogue State Tracking [84.50302759362698]
We enhance the transfer learning process by intermediate fine-tuning of pretrained multilingual models.
We use parallel and conversational movie subtitles datasets to design cross-lingual intermediate tasks.
We achieve impressive improvements (> 20% on goal accuracy) on the parallel MultiWoZ dataset and Multilingual WoZ dataset.
arXiv Detail & Related papers (2021-09-28T11:22:38Z) - MultiWOZ 2.3: A multi-domain task-oriented dialogue dataset enhanced
with annotation corrections and co-reference annotation [46.05021601314733]
Dialogue state annotations are error-prone, leading to sub-optimal performance.
We introduce MultiWOZ 2.3, in which we differentiate incorrect annotations in dialogue acts from dialogue states.
We implement co-reference features and unify annotations of dialogue acts and dialogue states.
arXiv Detail & Related papers (2020-10-12T10:53:19Z) - MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections
and State Tracking Baselines [15.540213987132839]
This work introduces MultiWOZ 2.2, which is a yet another improved version of this dataset.
Firstly, we identify and fix dialogue state annotation errors across 17.3% of the utterances on top of MultiWOZ 2.1.
Secondly, we redefine the vocabularies of slots with a large number of possible values.
arXiv Detail & Related papers (2020-07-10T22:52:14Z) - A Contextual Hierarchical Attention Network with Adaptive Objective for
Dialogue State Tracking [63.94927237189888]
We propose to enhance the dialogue state tracking (DST) through employing a contextual hierarchical attention network.
We also propose an adaptive objective to alleviate the slot imbalance problem by dynamically adjusting weights of different slots during training.
Experimental results show that our approach reaches 52.68% and 58.55% joint accuracy on MultiWOZ 2.0 and MultiWOZ 2.1 datasets.
arXiv Detail & Related papers (2020-06-02T12:25:44Z) - CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue
Dataset [58.910961297314415]
CrossWOZ is the first large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset.
It contains 6K dialogue sessions and 102K utterances for 5 domains, including hotel, restaurant, attraction, metro, and taxi.
arXiv Detail & Related papers (2020-02-27T03:06:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.