A Systematic Study of Performance Disparities in Multilingual
Task-Oriented Dialogue Systems
- URL: http://arxiv.org/abs/2310.12892v1
- Date: Thu, 19 Oct 2023 16:41:44 GMT
- Title: A Systematic Study of Performance Disparities in Multilingual
Task-Oriented Dialogue Systems
- Authors: Songbo Hu, Han Zhou, Moy Yuan, Milan Gritta, Guchun Zhang, Ignacio
Iacobacci, Anna Korhonen, Ivan Vuli\'c
- Abstract summary: We take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue systems.
We prove the existence of the adaptation and intrinsic biases in current ToD systems.
Our analyses offer practical tips on how to approach ToD data collection and system development for new languages.
- Score: 68.76102493999134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving robust language technologies that can perform well across the
world's many languages is a central goal of multilingual NLP. In this work, we
take stock of and empirically analyse task performance disparities that exist
between multilingual task-oriented dialogue (ToD) systems. We first define new
quantitative measures of absolute and relative equivalence in system
performance, capturing disparities across languages and within individual
languages. Through a series of controlled experiments, we demonstrate that
performance disparities depend on a number of factors: the nature of the ToD
task at hand, the underlying pretrained language model, the target language,
and the amount of ToD annotated data. We empirically prove the existence of the
adaptation and intrinsic biases in current ToD systems: e.g., ToD systems
trained for Arabic or Turkish using annotated ToD data fully parallel to
English ToD data still exhibit diminished ToD task performance. Beyond
providing a series of insights into the performance disparities of ToD systems
in different languages, our analyses offer practical tips on how to approach
ToD data collection and system development for new languages.
Related papers
- DIALIGHT: Lightweight Multilingual Development and Evaluation of
Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems.
It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level.
Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented
Dialogue Systems [66.92182084456809]
We introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset from an English ToD dataset.
Our method is based on translating dialogue templates and filling them with local entities in the target-language countries.
We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.
arXiv Detail & Related papers (2021-10-14T19:33:04Z) - BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue
Modeling [52.99188200886738]
BiToD is the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling.
BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base.
arXiv Detail & Related papers (2021-06-05T03:38:42Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.