AllWOZ: Towards Multilingual Task-Oriented Dialog Systems for All
- URL: http://arxiv.org/abs/2112.08333v1
- Date: Wed, 15 Dec 2021 18:30:51 GMT
- Title: AllWOZ: Towards Multilingual Task-Oriented Dialog Systems for All
- Authors: Lei Zuo, Kun Qian, Bowen Yang, Zhou Yu
- Abstract summary: This paper presents AllWOZ, a multilingual task-oriented customer service dialog dataset covering eight languages.
We create a benchmark for our multilingual dataset by applying mT5 with meta-learning.
- Score: 41.10368284872525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A commonly observed problem of the state-of-the-art natural language
technologies, such as Amazon Alexa and Apple Siri, is that their services do
not extend to most developing countries' citizens due to language barriers.
Such populations suffer due to the lack of available resources in their
languages to build NLP products. This paper presents AllWOZ, a multilingual
multi-domain task-oriented customer service dialog dataset covering eight
languages: English, Mandarin, Korean, Vietnamese, Hindi, French, Portuguese,
and Thai. Furthermore, we create a benchmark for our multilingual dataset by
applying mT5 with meta-learning.
Related papers
- SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages [77.75535024869224]
We present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages.
SeaLLMs 3 aims to bridge this gap by covering a comprehensive range of languages spoken in this region, including English, Chinese, Indonesian, Vietnamese, Thai, Tagalog, Malay, Burmese, Khmer, Lao, Tamil, and Javanese.
Our model excels in tasks such as world knowledge, mathematical reasoning, translation, and instruction following, achieving state-of-the-art performance among similarly sized models.
arXiv Detail & Related papers (2024-07-29T03:26:22Z) - MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for
Natural Language Understanding in Task-Oriented Dialogue [115.32009638844059]
We extend the English only NLU++ dataset to include manual translations into a range of high, medium, and low resource languages.
Because of its multi-intent property, MULTI3NLU++ represents complex and natural user goals.
We use MULTI3NLU++ to benchmark state-of-the-art multilingual models for the Natural Language Understanding tasks of intent detection and slot labelling.
arXiv Detail & Related papers (2022-12-20T17:34:25Z) - Making a MIRACL: Multilingual Information Retrieval Across a Continuum
of Languages [62.730361829175415]
MIRACL is a multilingual dataset we have built for the WSDM 2023 Cup challenge.
It focuses on ad hoc retrieval across 18 different languages.
Our goal is to spur research that will improve retrieval across a continuum of languages.
arXiv Detail & Related papers (2022-10-18T16:47:18Z) - NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local
Languages [100.59889279607432]
We focus on developing resources for languages in Indonesia.
Most languages in Indonesia are categorized as endangered and some are even extinct.
We develop the first-ever parallel resource for 10 low-resource languages in Indonesia.
arXiv Detail & Related papers (2022-05-31T17:03:50Z) - ViWOZ: A Multi-Domain Task-Oriented Dialogue Systems Dataset For
Low-resource Language [0.0]
ViWOZ is the first multi-turn, multi-domain tasked oriented dataset in Vietnamese.
The dataset consists of a total of 5,000 dialogues, including 60,946 fully annotated utterances.
With those characteristics, the ViWOZ dataset enables future studies on creating a multilingual task-oriented dialogue system.
arXiv Detail & Related papers (2022-03-15T09:22:04Z) - Towards Building ASR Systems for the Next Billion Users [15.867823754118422]
We make contributions towards building ASR systems for low resource languages from the Indian subcontinent.
First, we curate 17,000 hours of raw speech data for 40 Indian languages.
Using this raw speech data we pretrain several variants of wav2vec style models for 40 Indian languages.
arXiv Detail & Related papers (2021-11-06T19:34:33Z) - GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented
Dialogue Systems [66.92182084456809]
We introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset from an English ToD dataset.
Our method is based on translating dialogue templates and filling them with local entities in the target-language countries.
We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.
arXiv Detail & Related papers (2021-10-14T19:33:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.