BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue
Modeling
- URL: http://arxiv.org/abs/2106.02787v1
- Date: Sat, 5 Jun 2021 03:38:42 GMT
- Title: BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue
Modeling
- Authors: Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Peng Xu, Feijun
Jiang, Yuxiang Hu, Chen Shi, Pascale Fung
- Abstract summary: BiToD is the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling.
BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base.
- Score: 52.99188200886738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task-oriented dialogue (ToD) benchmarks provide an important avenue to
measure progress and develop better conversational agents. However, existing
datasets for end-to-end ToD modeling are limited to a single language,
hindering the development of robust end-to-end ToD systems for multilingual
countries and regions. Here we introduce BiToD, the first bilingual
multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD
contains over 7k multi-domain dialogues (144k utterances) with a large and
realistic bilingual knowledge base. It serves as an effective benchmark for
evaluating bilingual ToD systems and cross-lingual transfer learning
approaches. We provide state-of-the-art baselines under three evaluation
settings (monolingual, bilingual, and cross-lingual). The analysis of our
baselines in different settings highlights 1) the effectiveness of training a
bilingual ToD system compared to two independent monolingual ToD systems, and
2) the potential of leveraging a bilingual knowledge base and cross-lingual
transfer learning to improve the system performance under low resource
condition.
Related papers
- IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End
Task-Oriented Dialogue Systems [26.094144160398447]
This paper introduces IndoToD, an end-to-end multi domain ToD benchmark in Indonesian.
We extend two English ToD datasets to Indonesian, comprising four different domains by delexicalization to efficiently reduce the size of annotations.
To ensure a high-quality data collection, we hire native speakers to manually translate the dialogues.
arXiv Detail & Related papers (2023-11-02T03:01:53Z) - A Systematic Study of Performance Disparities in Multilingual
Task-Oriented Dialogue Systems [68.76102493999134]
We take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue systems.
We prove the existence of the adaptation and intrinsic biases in current ToD systems.
Our analyses offer practical tips on how to approach ToD data collection and system development for new languages.
arXiv Detail & Related papers (2023-10-19T16:41:44Z) - Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining
for Task-Oriented Dialog [67.20796950016735]
Multi2WOZ dataset spans four typologically diverse languages: Chinese, German, Arabic, and Russian.
We introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks.
Our experiments show that, in most setups, the best performance entails the combination of (I) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task.
arXiv Detail & Related papers (2022-05-20T18:35:38Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented
Dialogue Systems [66.92182084456809]
We introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset from an English ToD dataset.
Our method is based on translating dialogue templates and filling them with local entities in the target-language countries.
We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.
arXiv Detail & Related papers (2021-10-14T19:33:04Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.