ViWOZ: A Multi-Domain Task-Oriented Dialogue Systems Dataset For
Low-resource Language
- URL: http://arxiv.org/abs/2203.07742v1
- Date: Tue, 15 Mar 2022 09:22:04 GMT
- Title: ViWOZ: A Multi-Domain Task-Oriented Dialogue Systems Dataset For
Low-resource Language
- Authors: Phi Nguyen Van, Tung Cao Hoang, Dung Nguyen Manh, Quan Nguyen Minh,
Long Tran Quoc
- Abstract summary: ViWOZ is the first multi-turn, multi-domain tasked oriented dataset in Vietnamese.
The dataset consists of a total of 5,000 dialogues, including 60,946 fully annotated utterances.
With those characteristics, the ViWOZ dataset enables future studies on creating a multilingual task-oriented dialogue system.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Most of the current task-oriented dialogue systems (ToD), despite having
interesting results, are designed for a handful of languages like Chinese and
English. Therefore, their performance in low-resource languages is still a
significant problem due to the absence of a standard dataset and evaluation
policy. To address this problem, we proposed ViWOZ, a fully-annotated
Vietnamese task-oriented dialogue dataset. ViWOZ is the first multi-turn,
multi-domain tasked oriented dataset in Vietnamese, a low-resource language.
The dataset consists of a total of 5,000 dialogues, including 60,946 fully
annotated utterances. Furthermore, we provide a comprehensive benchmark of both
modular and end-to-end models in low-resource language scenarios. With those
characteristics, the ViWOZ dataset enables future studies on creating a
multilingual task-oriented dialogue system.
Related papers
- JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset [3.1311340484197814]
JMultiWOZ is the first Japanese language large-scale multi-domain task-oriented dialogue dataset.
We evaluated the dialogue state tracking and response generation capabilities of the state-of-the-art methods.
arXiv Detail & Related papers (2024-03-26T02:01:18Z) - Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for
Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems [64.40789703661987]
Multi3WOZ is a novel multilingual, multi-domain, multi-parallel ToD dataset.
It is large-scale and offers culturally adapted dialogs in 4 languages.
We describe a complex bottom-up data collection process that yielded the final dataset.
arXiv Detail & Related papers (2023-07-26T08:29:42Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z) - MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for
Natural Language Understanding in Task-Oriented Dialogue [115.32009638844059]
We extend the English only NLU++ dataset to include manual translations into a range of high, medium, and low resource languages.
Because of its multi-intent property, MULTI3NLU++ represents complex and natural user goals.
We use MULTI3NLU++ to benchmark state-of-the-art multilingual models for the Natural Language Understanding tasks of intent detection and slot labelling.
arXiv Detail & Related papers (2022-12-20T17:34:25Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented
Dialogue Systems [66.92182084456809]
We introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset from an English ToD dataset.
Our method is based on translating dialogue templates and filling them with local entities in the target-language countries.
We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.
arXiv Detail & Related papers (2021-10-14T19:33:04Z) - RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich
Semantic Annotations for Task-Oriented Dialogue Modeling [35.75880078666584]
RiSAWOZ is a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic s.
It contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains.
arXiv Detail & Related papers (2020-10-17T08:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.