Related papers: CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News

CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News

URL: http://arxiv.org/abs/2404.12242v1
Date: Thu, 18 Apr 2024 15:02:35 GMT
Title: CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News
Authors: Mengna Zhu, Zijie Xu, Kaisheng Zeng, Kaiming Xiao, Mao Wang, Wenjun Ke, Hongbin Huang,
Abstract summary: We propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset. It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain. We reproduce several state-of-the-art event extraction models with a systematic evaluation.
Score: 4.8309547228489125
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset. It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain including 8 event types and 11 argument role types. We designed a two-stage, multi-turns annotation strategy to ensure the quality of CMNEE and reproduced several state-of-the-art event extraction models with a systematic evaluation. The experimental results on CMNEE fall shorter than those on other domain datasets obviously, which demonstrates that event extraction for military domain poses unique challenges and requires further research efforts. Our code and data can be obtained from https://github.com/Mzzzhu/CMNEE.

Related papers

EventSum: A Large-Scale Event-Centric Summarization Dataset for Chinese Multi-News Documents [32.61252012805789]
Event-Centric Multi-Document Summarization (ECS) task aims to generate concise and comprehensive summaries of a given event based on multiple related news documents. We constructed the EventSum dataset, containing 5,100 events and a total of 57,984 news documents, with an average of 11.4 input news documents and 13,471 characters per event. We designed specific metrics including Event Recall, Argument Recall, Causal Recall, and Temporal Recall along with corresponding calculation methods for evaluation.
arXiv Detail & Related papers (2024-12-16T14:29:49Z)
Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task. We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities. Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z)
MAVEN-Fact: A Large-scale Event Factuality Detection Dataset [55.01875707021496]
We introduce MAVEN-Fact, a large-scale and high-quality EFD dataset based on the MAVEN dataset. MAVEN-Fact includes factuality annotations of 112,276 events, making it the largest EFD dataset. Experiments demonstrate that MAVEN-Fact is challenging for both conventional fine-tuned models and large language models (LLMs)
arXiv Detail & Related papers (2024-07-22T03:43:46Z)
EXCEEDS: Extracting Complex Events as Connecting the Dots to Graphs in Scientific Domain [57.56639626657212]
We construct SciEvents, a large-scale multi-event document-level dataset with a schema tailored for scientific domain. Then, we propose EXCEEDS, a novel end-to-end scientific event extraction framework by storing dense nuggets in a grid matrix. Experimental results demonstrate state-of-the-art performances of EXCEEDS on SciEvents.
arXiv Detail & Related papers (2024-06-20T07:50:37Z)
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning [98.26836657967162]
textbfAgentOhana aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. textbfxLAM-v0.1, a large action model tailored for AI agents, demonstrates exceptional performance across various benchmarks.
arXiv Detail & Related papers (2024-02-23T18:56:26Z)
Event-driven Real-time Retrieval in Web Search [15.235255100530496]
This paper expands the query with event information that represents real-time search intent. We further enhance the model's capacity for event representation through multi-task training. Our proposed approach significantly outperforms existing state-of-the-art baseline methods.
arXiv Detail & Related papers (2023-12-01T06:30:31Z)
From Simple to Complex: A Progressive Framework for Document-level Informative Argument Extraction [34.37013964529546]
Event Argument Extraction (EAE) requires the model to extract arguments of multiple events from a single document. We propose a simple-to-complex progressive framework for document-level EAE. Our model outperforms SOTA by 1.4% in F1, indicating the proposed simple-to-complex framework is useful in the EAE task.
arXiv Detail & Related papers (2023-10-25T04:38:02Z)
Boosting Event Extraction with Denoised Structure-to-Text Augmentation [52.21703002404442]
Event extraction aims to recognize pre-defined event triggers and arguments from texts. Recent data augmentation methods often neglect the problem of grammatical incorrectness. We propose a denoised structure-to-text augmentation framework for event extraction DAEE.
arXiv Detail & Related papers (2023-05-16T16:52:07Z)
MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts [7.43647091073357]
Event detection (ED) identifies and classifies event triggers from unstructured texts. We propose a new large-scale Chinese event detection dataset based on user reviews, text conversations, and phone conversations.
arXiv Detail & Related papers (2022-11-25T05:05:29Z)
MEE: A Novel Multilingual Event Extraction Dataset [62.80569691825534]
Event Extraction aims to recognize event mentions and their arguments from text. The lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance. We propose a novel Multilingual Event Extraction dataset (EE) that provides annotation for more than 50K event mentions in 8 typologically different languages.
arXiv Detail & Related papers (2022-11-11T02:01:41Z)
COfEE: A Comprehensive Ontology for Event Extraction from text, with an online annotation tool [3.8995911009078816]
Event Extraction (EE) seeks to derive information about specific incidents and their actors from the text. EE is useful in many domains such as building a knowledge base, information retrieval, summarization and online monitoring systems. COfEE consists of two hierarchy levels (event types and event sub-types) that include new categories relating to environmental issues, cyberspace, criminal activity and natural disasters.
arXiv Detail & Related papers (2021-07-21T19:43:22Z)
MAVEN: A Massive General Domain Event Detection Dataset [56.00401399384715]
Event detection (ED) is the first and most fundamental step for extracting event knowledge from plain text. Existing datasets exhibit issues that limit further development of ED. We present a MAssive eVENt detection dataset (MAVEN), which contains 4,480 Wikipedia documents, 118,732 event mention instances, and 168 event types.
arXiv Detail & Related papers (2020-04-28T15:25:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.