Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems
- URL: http://arxiv.org/abs/2104.08570v1
- Date: Sat, 17 Apr 2021 15:19:56 GMT
- Title: Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems
- Authors: Evgeniia Razumovskaia, Goran Glava\v{s}, Olga Majewska, Anna Korhonen,
Ivan Vuli\'c
- Abstract summary: Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
- Score: 51.328224222640614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the fact that natural language conversations with machines represent
one of the central objectives of AI, and despite the massive increase of
research and development efforts in conversational AI, task-oriented dialogue
(ToD) -- i.e., conversations with an artificial agent with the aim of
completing a concrete task -- is currently limited to a few narrow domains
(e.g., food ordering, ticket booking) and a handful of major languages (e.g.,
English, Chinese). In this work, we provide an extensive overview of existing
efforts in multilingual ToD and analyse the factors preventing the development
of truly multilingual ToD systems. We identify two main challenges that
combined hinder the faster progress in multilingual ToD: (1) current
state-of-the-art ToD models based on large pretrained neural language models
are data hungry; at the same time (2) data acquisition for ToD use cases is
expensive and tedious. Most existing approaches to multilingual ToD thus rely
on (zero- or few-shot) cross-lingual transfer from resource-rich languages (in
ToD, this is basically only English), either by means of (i) machine
translation or (ii) multilingual representation spaces. However, such
approaches are currently not a viable solution for a large number of
low-resource languages without parallel data and/or limited monolingual
corpora. Finally, we discuss critical challenges and potential solutions by
drawing parallels between ToD and other cross-lingual and multilingual NLP
research.
Related papers
- A Systematic Study of Performance Disparities in Multilingual
Task-Oriented Dialogue Systems [68.76102493999134]
We take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue systems.
We prove the existence of the adaptation and intrinsic biases in current ToD systems.
Our analyses offer practical tips on how to approach ToD data collection and system development for new languages.
arXiv Detail & Related papers (2023-10-19T16:41:44Z) - Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining
for Task-Oriented Dialog [67.20796950016735]
Multi2WOZ dataset spans four typologically diverse languages: Chinese, German, Arabic, and Russian.
We introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks.
Our experiments show that, in most setups, the best performance entails the combination of (I) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task.
arXiv Detail & Related papers (2022-05-20T18:35:38Z) - Overcoming Language Disparity in Online Content Classification with
Multimodal Learning [22.73281502531998]
Large language models are now the standard to develop state-of-the-art solutions for text detection and classification tasks.
The development of advanced computational techniques and resources is disproportionately focused on the English language.
We explore the promise of incorporating the information contained in images via multimodal machine learning.
arXiv Detail & Related papers (2022-05-19T17:56:02Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented
Dialogue Systems [66.92182084456809]
We introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset from an English ToD dataset.
Our method is based on translating dialogue templates and filling them with local entities in the target-language countries.
We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.
arXiv Detail & Related papers (2021-10-14T19:33:04Z) - BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue
Modeling [52.99188200886738]
BiToD is the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling.
BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base.
arXiv Detail & Related papers (2021-06-05T03:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.