MAVEN: A Massive General Domain Event Detection Dataset
- URL: http://arxiv.org/abs/2004.13590v2
- Date: Thu, 8 Oct 2020 09:19:13 GMT
- Title: MAVEN: A Massive General Domain Event Detection Dataset
- Authors: Xiaozhi Wang, Ziqi Wang, Xu Han, Wangyi Jiang, Rong Han, Zhiyuan Liu,
Juanzi Li, Peng Li, Yankai Lin, Jie Zhou
- Abstract summary: Event detection (ED) is the first and most fundamental step for extracting event knowledge from plain text.
Existing datasets exhibit issues that limit further development of ED.
We present a MAssive eVENt detection dataset (MAVEN), which contains 4,480 Wikipedia documents, 118,732 event mention instances, and 168 event types.
- Score: 56.00401399384715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event detection (ED), which means identifying event trigger words and
classifying event types, is the first and most fundamental step for extracting
event knowledge from plain text. Most existing datasets exhibit the following
issues that limit further development of ED: (1) Data scarcity. Existing
small-scale datasets are not sufficient for training and stably benchmarking
increasingly sophisticated modern neural methods. (2) Low coverage. Limited
event types of existing datasets cannot well cover general-domain events, which
restricts the applications of ED models. To alleviate these problems, we
present a MAssive eVENt detection dataset (MAVEN), which contains 4,480
Wikipedia documents, 118,732 event mention instances, and 168 event types.
MAVEN alleviates the data scarcity problem and covers much more general event
types. We reproduce the recent state-of-the-art ED models and conduct a
thorough evaluation on MAVEN. The experimental results show that existing ED
methods cannot achieve promising results on MAVEN as on the small datasets,
which suggests that ED in the real world remains a challenging task and
requires further research efforts. We also discuss further directions for
general domain ED with empirical analyses. The source code and dataset can be
obtained from https://github.com/THU-KEG/MAVEN-dataset.
Related papers
- MAVEN-Fact: A Large-scale Event Factuality Detection Dataset [55.01875707021496]
We introduce MAVEN-Fact, a large-scale and high-quality EFD dataset based on the MAVEN dataset.
MAVEN-Fact includes factuality annotations of 112,276 events, making it the largest EFD dataset.
Experiments demonstrate that MAVEN-Fact is challenging for both conventional fine-tuned models and large language models (LLMs)
arXiv Detail & Related papers (2024-07-22T03:43:46Z) - MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation [104.6065882758648]
MAVEN-Arg is the first all-in-one dataset supporting event detection, event argument extraction, and event relation extraction.
As an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive schema covering 162 event types and 612 argument roles, all with expert-written definitions and examples; (2) a large data scale, containing 98,591 events and 290,613 arguments obtained with laborious human annotation; and (3) the exhaustive annotation supporting all task variants of EAE.
arXiv Detail & Related papers (2023-11-15T16:52:14Z) - MsPrompt: Multi-step Prompt Learning for Debiasing Few-shot Event
Detection [16.98619925632727]
Event detection (ED) aims to identify the key trigger words in unstructured text and predict the event types accordingly.
Traditional ED models are too data-hungry to accommodate real applications with scarce labeled data.
We propose a multi-step prompt learning model (MsPrompt) for debiasing few-shot event detection.
arXiv Detail & Related papers (2023-05-16T10:19:12Z) - Abnormal Event Detection via Hypergraph Contrastive Learning [54.80429341415227]
Abnormal event detection plays an important role in many real applications.
In this paper, we study the unsupervised abnormal event detection problem in Attributed Heterogeneous Information Network.
A novel hypergraph contrastive learning method, named AEHCL, is proposed to fully capture abnormal event patterns.
arXiv Detail & Related papers (2023-04-02T08:23:20Z) - MEE: A Novel Multilingual Event Extraction Dataset [62.80569691825534]
Event Extraction aims to recognize event mentions and their arguments from text.
The lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance.
We propose a novel Multilingual Event Extraction dataset (EE) that provides annotation for more than 50K event mentions in 8 typologically different languages.
arXiv Detail & Related papers (2022-11-11T02:01:41Z) - Event Detection Explorer: An Interactive Tool for Event Detection
Exploration [15.673794190575295]
Event Detection (ED) is an important task in natural language processing.
In this paper, we present an interactive and easy-to-use tool, namely ED Explorer, for ED dataset and model exploration.
arXiv Detail & Related papers (2022-04-26T17:22:37Z) - Event Data Association via Robust Model Fitting for Event-based Object Tracking [66.05728523166755]
We propose a novel Event Data Association (called EDA) approach to explicitly address the event association and fusion problem.
The proposed EDA seeks for event trajectories that best fit the event data, in order to perform unifying data association and information fusion.
The experimental results show the effectiveness of EDA under challenging scenarios, such as high speed, motion blur, and high dynamic range conditions.
arXiv Detail & Related papers (2021-10-25T13:56:00Z) - OntoED: Low-resource Event Detection with Ontology Embedding [19.126410765996077]
Event Detection (ED) aims to identify event trigger words from a given text and classify it into an event type.
Most of current methods to ED rely heavily on training instances, and almost ignore the correlation of event types.
arXiv Detail & Related papers (2021-05-23T12:00:22Z) - Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series [6.085662888748731]
We present Exathlon, the first benchmark for explainable anomaly detection over high-dimensional time series data.
Exathlon has been constructed based on real data traces from repeated executions of large-scale stream processing jobs on an Apache Spark cluster.
For each of the anomaly instances, ground truth labels for the root cause interval as well as those for the extended effect interval are provided.
arXiv Detail & Related papers (2020-10-10T19:31:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.