MEE: A Novel Multilingual Event Extraction Dataset
- URL: http://arxiv.org/abs/2211.05955v1
- Date: Fri, 11 Nov 2022 02:01:41 GMT
- Title: MEE: A Novel Multilingual Event Extraction Dataset
- Authors: Amir Pouran Ben Veyseh, Javid Ebrahimi, Franck Dernoncourt, and Thien
Huu Nguyen
- Abstract summary: Event Extraction aims to recognize event mentions and their arguments from text.
The lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance.
We propose a novel Multilingual Event Extraction dataset (EE) that provides annotation for more than 50K event mentions in 8 typologically different languages.
- Score: 62.80569691825534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Event Extraction (EE) is one of the fundamental tasks in Information
Extraction (IE) that aims to recognize event mentions and their arguments
(i.e., participants) from text. Due to its importance, extensive methods and
resources have been developed for Event Extraction. However, one limitation of
current research for EE involves the under-exploration for non-English
languages in which the lack of high-quality multilingual EE datasets for model
training and evaluation has been the main hindrance. To address this
limitation, we propose a novel Multilingual Event Extraction dataset (MEE) that
provides annotation for more than 50K event mentions in 8 typologically
different languages. MEE comprehensively annotates data for entity mentions,
event triggers and event arguments. We conduct extensive experiments on the
proposed dataset to reveal challenges and opportunities for multilingual EE.
Related papers
- Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task.
We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities.
Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z) - Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis [18.25948580496853]
Cross-lingual transfer-learning is widely used in Event Extraction for low-resource languages.
This paper studies whether the typological similarity between source and target languages impacts the performance of cross-lingual transfer.
arXiv Detail & Related papers (2024-04-09T15:35:41Z) - MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation [104.6065882758648]
MAVEN-Arg is the first all-in-one dataset supporting event detection, event argument extraction, and event relation extraction.
As an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive schema covering 162 event types and 612 argument roles, all with expert-written definitions and examples; (2) a large data scale, containing 98,591 events and 290,613 arguments obtained with laborious human annotation; and (3) the exhaustive annotation supporting all task variants of EAE.
arXiv Detail & Related papers (2023-11-15T16:52:14Z) - MINION: a Large-Scale and Diverse Dataset for Multilingual Event
Detection [65.46122357928041]
Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text.
Main questions include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages.
We introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages.
arXiv Detail & Related papers (2022-11-11T02:09:51Z) - Title2Event: Benchmarking Open Event Extraction with a Large-scale
Chinese Title Dataset [19.634367718707857]
We present Title2Event, a large-scale sentence-level dataset benchmarking Open Event Extraction without restricting event types.
Title2Event contains more than 42,000 news titles in 34 topics collected from Chinese web pages.
To the best of our knowledge, it is currently the largest manually-annotated Chinese dataset for open event extraction.
arXiv Detail & Related papers (2022-11-02T04:39:36Z) - PILED: An Identify-and-Localize Framework for Few-Shot Event Detection [79.66042333016478]
In our study, we employ cloze prompts to elicit event-related knowledge from pretrained language models.
We minimize the number of type-specific parameters, enabling our model to quickly adapt to event detection tasks for new types.
arXiv Detail & Related papers (2022-02-15T18:01:39Z) - Event Argument Extraction using Causal Knowledge Structures [9.56216681584111]
Event Argument extraction refers to the task of extracting structured information from unstructured text for a particular event of interest.
Most of the existing works model this task at a sentence level, restricting the context to a local scope.
We propose an external knowledge aided approach to infuse document-level event information to aid the extraction of complex event arguments.
arXiv Detail & Related papers (2021-05-02T13:59:07Z) - Detecting Ongoing Events Using Contextual Word and Sentence Embeddings [110.83289076967895]
This paper introduces the Ongoing Event Detection (OED) task.
The goal is to detect ongoing event mentions only, as opposed to historical, future, hypothetical, or other forms or events that are neither fresh nor current.
Any application that needs to extract structured information about ongoing events from unstructured texts can take advantage of an OED system.
arXiv Detail & Related papers (2020-07-02T20:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.