Benchmarks Underestimate the Readiness of Multi-lingual Dialogue Agents
- URL: http://arxiv.org/abs/2405.17840v2
- Date: Sun, 16 Jun 2024 20:45:23 GMT
- Title: Benchmarks Underestimate the Readiness of Multi-lingual Dialogue Agents
- Authors: Andrew H. Lee, Sina J. Semnani, Galo Castillo-López, Gäel de Chalendar, Monojit Choudhury, Ashna Dua, Kapil Rajesh Kavitha, Sungkyun Kim, Prashant Kodali, Ponnurangam Kumaraguru, Alexis Lombard, Mehrad Moradshahi, Gihyun Park, Nasredine Semmar, Jiwon Seo, Tianhao Shen, Manish Shrivastava, Deyi Xiong, Monica S. Lam,
- Abstract summary: We show for the first time, that in-context learning is sufficient to tackle multilingual TOD.
We test our approach on the multilingual TOD dataset X-RiSAWOZ, which has 12 domains in Chinese, English, French, Korean, Hindi, and code-mixed Hindi-English.
- Score: 39.92509218078164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Creating multilingual task-oriented dialogue (TOD) agents is challenging due to the high cost of training data acquisition. Following the research trend of improving training data efficiency, we show for the first time, that in-context learning is sufficient to tackle multilingual TOD. To handle the challenging dialogue state tracking (DST) subtask, we break it down to simpler steps that are more compatible with in-context learning where only a handful of few-shot examples are used. We test our approach on the multilingual TOD dataset X-RiSAWOZ, which has 12 domains in Chinese, English, French, Korean, Hindi, and code-mixed Hindi-English. Our turn-by-turn DST accuracy on the 6 languages range from 55.6% to 80.3%, seemingly worse than the SOTA results from fine-tuned models that achieve from 60.7% to 82.8%; our BLEU scores in the response generation (RG) subtask are also significantly lower than SOTA. However, after manual evaluation of the validation set, we find that by correcting gold label errors and improving dataset annotation schema, GPT-4 with our prompts can achieve (1) 89.6%-96.8% accuracy in DST, and (2) more than 99% correct response generation across different languages. This leads us to conclude that current automatic metrics heavily underestimate the effectiveness of in-context learning.
Related papers
- Grammatical Error Correction for Low-Resource Languages: The Case of Zarma [8.057796934109938]
Grammatical error correction (GEC) is important for improving written materials for low-resource languages like Zarma.
This study compares rule-based methods, machine translation (MT) models, and large language models (LLMs) for GEC in Zarma.
arXiv Detail & Related papers (2024-10-20T23:51:36Z) - Self-supervised Adaptive Pre-training of Multilingual Speech Models for
Language and Dialect Identification [19.893213508284813]
Self-supervised adaptive pre-training is proposed to adapt the pre-trained model to the target domain and languages of the downstream task.
We show that SAPT improves XLSR performance on the FLEURS benchmark with substantial gains up to 40.1% for under-represented languages.
arXiv Detail & Related papers (2023-12-12T14:58:08Z) - Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a
Distilled Representation [5.551814548069404]
We propose automatic methods that use ToD training data in a source language to build a high-quality functioning dialogue agent.
We show that our approach closes the accuracy gap between few-shot and existing full-shot methods for ToD agents.
arXiv Detail & Related papers (2023-02-18T21:30:36Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv Detail & Related papers (2022-01-19T00:11:38Z) - Investigating Effect of Dialogue History in Multilingual Task Oriented
Dialogue Systems [2.695466667982714]
Up to Dec 2021, Alexa, one of the most popular smart speakers around the world, is able to support 9 different languages.
Training a virtual assistant in other languages is often more difficult, especially for those low-resource languages.
We devise an efficient and effective training solution for multilingual task-orientated dialogue systems.
arXiv Detail & Related papers (2021-12-23T02:27:10Z) - Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues [7.8378818005171125]
Given a large-scale dialogue data set in one language, we can automatically produce an effective semantic for other languages using machine translation.
We propose automatic translation of dialogue datasets with alignment to ensure faithful translation of slot values.
We show that the succinct representation reduces the compounding effect of translation errors.
arXiv Detail & Related papers (2021-11-04T01:08:14Z) - Improving Limited Labeled Dialogue State Tracking with Self-Supervision [91.68515201803986]
Existing dialogue state tracking (DST) models require plenty of labeled data.
We present and investigate two self-supervised objectives: preserving latent consistency and modeling conversational behavior.
Our proposed self-supervised signals can improve joint goal accuracy by 8.95% when only 1% labeled data is used.
arXiv Detail & Related papers (2020-10-26T21:57:42Z) - Multilingual Speech Translation with Efficient Finetuning of Pretrained
Models [82.22294901727933]
A minimalistic LNA (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross-modality transfer ability.
Our approach demonstrates strong zero-shot performance in a many-to-many multilingual model.
arXiv Detail & Related papers (2020-10-24T08:15:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.