CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News
- URL: http://arxiv.org/abs/2404.12242v1
- Date: Thu, 18 Apr 2024 15:02:35 GMT
- Title: CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News
- Authors: Mengna Zhu, Zijie Xu, Kaisheng Zeng, Kaiming Xiao, Mao Wang, Wenjun Ke, Hongbin Huang,
- Abstract summary: We propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset.
It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain.
We reproduce several state-of-the-art event extraction models with a systematic evaluation.
- Score: 4.8309547228489125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset. It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain including 8 event types and 11 argument role types. We designed a two-stage, multi-turns annotation strategy to ensure the quality of CMNEE and reproduced several state-of-the-art event extraction models with a systematic evaluation. The experimental results on CMNEE fall shorter than those on other domain datasets obviously, which demonstrates that event extraction for military domain poses unique challenges and requires further research efforts. Our code and data can be obtained from https://github.com/Mzzzhu/CMNEE.
Related papers
- Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task.
We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities.
Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z) - MAVEN-Fact: A Large-scale Event Factuality Detection Dataset [55.01875707021496]
We introduce MAVEN-Fact, a large-scale and high-quality EFD dataset based on the MAVEN dataset.
MAVEN-Fact includes factuality annotations of 112,276 events, making it the largest EFD dataset.
Experiments demonstrate that MAVEN-Fact is challenging for both conventional fine-tuned models and large language models (LLMs)
arXiv Detail & Related papers (2024-07-22T03:43:46Z) - EXCEEDS: Extracting Complex Events as Connecting the Dots to Graphs in Scientific Domain [57.56639626657212]
We construct SciEvents, a large-scale multi-event document-level dataset with a schema tailored for scientific domain.
Then, we propose EXCEEDS, a novel end-to-end scientific event extraction framework by storing dense nuggets in a grid matrix.
Experimental results demonstrate state-of-the-art performances of EXCEEDS on SciEvents.
arXiv Detail & Related papers (2024-06-20T07:50:37Z) - AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning [98.26836657967162]
textbfAgentOhana aggregates agent trajectories from distinct environments, spanning a wide array of scenarios.
textbfxLAM-v0.1, a large action model tailored for AI agents, demonstrates exceptional performance across various benchmarks.
arXiv Detail & Related papers (2024-02-23T18:56:26Z) - Event-driven Real-time Retrieval in Web Search [15.235255100530496]
This paper expands the query with event information that represents real-time search intent.
We further enhance the model's capacity for event representation through multi-task training.
Our proposed approach significantly outperforms existing state-of-the-art baseline methods.
arXiv Detail & Related papers (2023-12-01T06:30:31Z) - From Simple to Complex: A Progressive Framework for Document-level
Informative Argument Extraction [34.37013964529546]
Event Argument Extraction (EAE) requires the model to extract arguments of multiple events from a single document.
We propose a simple-to-complex progressive framework for document-level EAE.
Our model outperforms SOTA by 1.4% in F1, indicating the proposed simple-to-complex framework is useful in the EAE task.
arXiv Detail & Related papers (2023-10-25T04:38:02Z) - MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous
Informal Texts [7.43647091073357]
Event detection (ED) identifies and classifies event triggers from unstructured texts.
We propose a new large-scale Chinese event detection dataset based on user reviews, text conversations, and phone conversations.
arXiv Detail & Related papers (2022-11-25T05:05:29Z) - MEE: A Novel Multilingual Event Extraction Dataset [62.80569691825534]
Event Extraction aims to recognize event mentions and their arguments from text.
The lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance.
We propose a novel Multilingual Event Extraction dataset (EE) that provides annotation for more than 50K event mentions in 8 typologically different languages.
arXiv Detail & Related papers (2022-11-11T02:01:41Z) - COfEE: A Comprehensive Ontology for Event Extraction from text, with an
online annotation tool [3.8995911009078816]
Event Extraction (EE) seeks to derive information about specific incidents and their actors from the text.
EE is useful in many domains such as building a knowledge base, information retrieval, summarization and online monitoring systems.
COfEE consists of two hierarchy levels (event types and event sub-types) that include new categories relating to environmental issues, cyberspace, criminal activity and natural disasters.
arXiv Detail & Related papers (2021-07-21T19:43:22Z) - MAVEN: A Massive General Domain Event Detection Dataset [56.00401399384715]
Event detection (ED) is the first and most fundamental step for extracting event knowledge from plain text.
Existing datasets exhibit issues that limit further development of ED.
We present a MAssive eVENt detection dataset (MAVEN), which contains 4,480 Wikipedia documents, 118,732 event mention instances, and 168 event types.
arXiv Detail & Related papers (2020-04-28T15:25:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.