The Massively Multilingual Natural Language Understanding 2022
(MMNLU-22) Workshop and Competition
- URL: http://arxiv.org/abs/2212.06346v1
- Date: Tue, 13 Dec 2022 03:00:36 GMT
- Title: The Massively Multilingual Natural Language Understanding 2022
(MMNLU-22) Workshop and Competition
- Authors: Christopher Hench, Charith Peris, Jack FitzGerald, Kay Rottmann
- Abstract summary: It is common to have Natural Language Understanding systems limited to a subset of languages due to lack of available data.
We launch a three-phase approach to address the limitations in NLU and help propel NLU technology to new heights.
We release a 52 language dataset called the Multilingual Amazon SLU resource package (SLURP) for Slot-filling, Intent classification, and Virtual assistant Evaluation.
We organize the Massively Multilingual NLU 2022 Challenge to provide a competitive environment and push the state-of-the art in the transferability of models into other languages.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent progress in Natural Language Understanding (NLU), the creation
of multilingual NLU systems remains a challenge. It is common to have NLU
systems limited to a subset of languages due to lack of available data. They
also often vary widely in performance. We launch a three-phase approach to
address the limitations in NLU and help propel NLU technology to new heights.
We release a 52 language dataset called the Multilingual Amazon SLU resource
package (SLURP) for Slot-filling, Intent classification, and Virtual assistant
Evaluation, or MASSIVE, in an effort to address parallel data availability for
voice assistants. We organize the Massively Multilingual NLU 2022 Challenge to
provide a competitive environment and push the state-of-the art in the
transferability of models into other languages. Finally, we host the first
Massively Multilingual NLU workshop which brings these components together. The
MMNLU workshop seeks to advance the science behind multilingual NLU by
providing a platform for the presentation of new research in the field and
connecting teams working on this research direction. This paper summarizes the
dataset, workshop and the competition and the findings of each phase.
Related papers
- SCOPE: Sign Language Contextual Processing with Embedding from LLMs [49.5629738637893]
Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information.
Current methods in vision-based sign language recognition ( SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information.
We introduce SCOPE, a novel context-aware vision-based SLR and SLT framework.
arXiv Detail & Related papers (2024-09-02T08:56:12Z) - Constructing and Expanding Low-Resource and Underrepresented Parallel Datasets for Indonesian Local Languages [0.0]
We introduce Bhinneka Korpus, a multilingual parallel corpus featuring five Indonesian local languages.
Our goal is to enhance access and utilization of these resources, extending their reach within the country.
arXiv Detail & Related papers (2024-04-01T09:24:06Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for
Natural Language Understanding in Task-Oriented Dialogue [115.32009638844059]
We extend the English only NLU++ dataset to include manual translations into a range of high, medium, and low resource languages.
Because of its multi-intent property, MULTI3NLU++ represents complex and natural user goals.
We use MULTI3NLU++ to benchmark state-of-the-art multilingual models for the Natural Language Understanding tasks of intent detection and slot labelling.
arXiv Detail & Related papers (2022-12-20T17:34:25Z) - BLOOM: A 176B-Parameter Open-Access Multilingual Language Model [264.96498474333697]
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.
We present BLOOM, a 176B- parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers.
BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages.
arXiv Detail & Related papers (2022-11-09T18:48:09Z) - Multi-level Contrastive Learning for Cross-lingual Spoken Language
Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels.
We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z) - Towards More Robust Natural Language Understanding [0.0]
Natural Language Understanding (NLU) is branch of Natural Language Processing (NLP)
Recent years have witnessed notable progress across various NLU tasks with deep learning techniques.
It's worth noting that the human ability of understanding natural language is flexible and robust.
arXiv Detail & Related papers (2021-12-01T17:27:19Z) - From Masked Language Modeling to Translation: Non-English Auxiliary
Tasks Improve Zero-shot Spoken Language Understanding [24.149299722716155]
We introduce xSID, a new benchmark for cross-lingual Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect.
We propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer.
Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification.
arXiv Detail & Related papers (2021-05-15T23:51:11Z) - X-METRA-ADA: Cross-lingual Meta-Transfer Learning Adaptation to Natural
Language Understanding and Question Answering [55.57776147848929]
We propose X-METRA-ADA, a cross-lingual MEta-TRAnsfer learning ADAptation approach for Natural Language Understanding (NLU)
Our approach adapts MAML, an optimization-based meta-learning approach, to learn to adapt to new languages.
We show that our approach outperforms naive fine-tuning, reaching competitive performance on both tasks for most languages.
arXiv Detail & Related papers (2021-04-20T00:13:35Z) - ParsiNLU: A Suite of Language Understanding Challenges for Persian [23.26176232463948]
This work focuses on Persian language, one of the widely spoken languages in the world.
There are few NLU datasets available for this rich language.
ParsiNLU is the first benchmark in Persian language that includes a range of high-level tasks.
arXiv Detail & Related papers (2020-12-11T06:31:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.