IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End
Task-Oriented Dialogue Systems
- URL: http://arxiv.org/abs/2311.00958v1
- Date: Thu, 2 Nov 2023 03:01:53 GMT
- Title: IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End
Task-Oriented Dialogue Systems
- Authors: Muhammad Dehan Al Kautsar, Rahmah Khoirussyifa' Nurdini, Samuel
Cahyawijaya, Genta Indra Winata, Ayu Purwarianti
- Abstract summary: This paper introduces IndoToD, an end-to-end multi domain ToD benchmark in Indonesian.
We extend two English ToD datasets to Indonesian, comprising four different domains by delexicalization to efficiently reduce the size of annotations.
To ensure a high-quality data collection, we hire native speakers to manually translate the dialogues.
- Score: 26.094144160398447
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Task-oriented dialogue (ToD) systems have been mostly created for
high-resource languages, such as English and Chinese. However, there is a need
to develop ToD systems for other regional or local languages to broaden their
ability to comprehend the dialogue contexts in various languages. This paper
introduces IndoToD, an end-to-end multi domain ToD benchmark in Indonesian. We
extend two English ToD datasets to Indonesian, comprising four different
domains by delexicalization to efficiently reduce the size of annotations. To
ensure a high-quality data collection, we hire native speakers to manually
translate the dialogues. Along with the original English datasets, these new
Indonesian datasets serve as an effective benchmark for evaluating Indonesian
and English ToD systems as well as exploring the potential benefits of
cross-lingual and bilingual transfer learning approaches.
Related papers
- A Systematic Study of Performance Disparities in Multilingual
Task-Oriented Dialogue Systems [68.76102493999134]
We take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue systems.
We prove the existence of the adaptation and intrinsic biases in current ToD systems.
Our analyses offer practical tips on how to approach ToD data collection and system development for new languages.
arXiv Detail & Related papers (2023-10-19T16:41:44Z) - Cross-lingual Data Augmentation for Document-grounded Dialog Systems in
Low Resource Languages [0.0]
We present a novel pipeline CLEM (Cross-Lingual Enhanced Model) including adversarial training retrieval (Retriever and Re-ranker) and Fid (fusion-in-decoder) generator.
To further leverage high-resource language, we also propose an innovative architecture to conduct alignment across different languages with translated training.
arXiv Detail & Related papers (2023-05-24T09:40:52Z) - Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining
for Task-Oriented Dialog [67.20796950016735]
Multi2WOZ dataset spans four typologically diverse languages: Chinese, German, Arabic, and Russian.
We introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks.
Our experiments show that, in most setups, the best performance entails the combination of (I) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task.
arXiv Detail & Related papers (2022-05-20T18:35:38Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented
Dialogue Systems [66.92182084456809]
We introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset from an English ToD dataset.
Our method is based on translating dialogue templates and filling them with local entities in the target-language countries.
We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.
arXiv Detail & Related papers (2021-10-14T19:33:04Z) - BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue
Modeling [52.99188200886738]
BiToD is the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling.
BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base.
arXiv Detail & Related papers (2021-06-05T03:38:42Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.