MINION: a Large-Scale and Diverse Dataset for Multilingual Event
Detection
- URL: http://arxiv.org/abs/2211.05958v1
- Date: Fri, 11 Nov 2022 02:09:51 GMT
- Title: MINION: a Large-Scale and Diverse Dataset for Multilingual Event
Detection
- Authors: Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, and Thien
Huu Nguyen
- Abstract summary: Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text.
Main questions include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages.
We introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages.
- Score: 65.46122357928041
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Event Detection (ED) is the task of identifying and classifying trigger words
of event mentions in text. Despite considerable research efforts in recent
years for English text, the task of ED in other languages has been
significantly less explored. Switching to non-English languages, important
research questions for ED include how well existing ED models perform on
different languages, how challenging ED is in other languages, and how well ED
knowledge and annotation can be transferred across languages. To answer those
questions, it is crucial to obtain multilingual ED datasets that provide
consistent event annotation for multiple languages. There exist some
multilingual ED datasets; however, they tend to cover a handful of languages
and mainly focus on popular ones. Many languages are not covered in existing
multilingual ED datasets. In addition, the current datasets are often small and
not accessible to the public. To overcome those shortcomings, we introduce a
new large-scale multilingual dataset for ED (called MINION) that consistently
annotates events for 8 different languages; 5 of them have not been supported
by existing multilingual datasets. We also perform extensive experiments and
analysis to demonstrate the challenges and transferability of ED across
languages in MINION that in all call for more research effort in this area.
Related papers
- Cross-lingual Editing in Multilingual Language Models [1.3062731746155414]
This paper introduces the cross-lingual model editing (textbfXME) paradigm, wherein a fact is edited in one language, and the subsequent update propagation is observed across other languages.
The results reveal notable performance limitations of state-of-the-art METs under the XME setting, mainly when the languages involved belong to two distinct script families.
arXiv Detail & Related papers (2024-01-19T06:54:39Z) - Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for
Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems [64.40789703661987]
Multi3WOZ is a novel multilingual, multi-domain, multi-parallel ToD dataset.
It is large-scale and offers culturally adapted dialogs in 4 languages.
We describe a complex bottom-up data collection process that yielded the final dataset.
arXiv Detail & Related papers (2023-07-26T08:29:42Z) - MEE: A Novel Multilingual Event Extraction Dataset [62.80569691825534]
Event Extraction aims to recognize event mentions and their arguments from text.
The lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance.
We propose a novel Multilingual Event Extraction dataset (EE) that provides annotation for more than 50K event mentions in 8 typologically different languages.
arXiv Detail & Related papers (2022-11-11T02:01:41Z) - MACRONYM: A Large-Scale Dataset for Multilingual and Multi-Domain
Acronym Extraction [66.60031336330547]
Acronyms and their expanded forms are necessary for various NLP applications.
One limitation of existing AE research is that they are limited to the English language and certain domains.
Lacking annotated datasets in multiple languages and domains has been a major issue to hinder research in this area.
arXiv Detail & Related papers (2022-02-19T23:08:38Z) - GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented
Dialogue Systems [66.92182084456809]
We introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset from an English ToD dataset.
Our method is based on translating dialogue templates and filling them with local entities in the target-language countries.
We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.
arXiv Detail & Related papers (2021-10-14T19:33:04Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.