The Devil is in the Details: On the Pitfalls of Event Extraction
Evaluation
- URL: http://arxiv.org/abs/2306.06918v2
- Date: Thu, 15 Jun 2023 07:23:57 GMT
- Title: The Devil is in the Details: On the Pitfalls of Event Extraction
Evaluation
- Authors: Hao Peng, Xiaozhi Wang, Feng Yao, Kaisheng Zeng, Lei Hou, Juanzi Li,
Zhiyuan Liu, Weixing Shen
- Abstract summary: Event extraction (EE) is a crucial task aiming at extracting events from texts.
In this paper, we check the reliability of EE evaluations and identify three major pitfalls.
- Score: 46.09190731105998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Event extraction (EE) is a crucial task aiming at extracting events from
texts, which includes two subtasks: event detection (ED) and event argument
extraction (EAE). In this paper, we check the reliability of EE evaluations and
identify three major pitfalls: (1) The data preprocessing discrepancy makes the
evaluation results on the same dataset not directly comparable, but the data
preprocessing details are not widely noted and specified in papers. (2) The
output space discrepancy of different model paradigms makes different-paradigm
EE models lack grounds for comparison and also leads to unclear mapping issues
between predictions and annotations. (3) The absence of pipeline evaluation of
many EAE-only works makes them hard to be directly compared with EE works and
may not well reflect the model performance in real-world pipeline scenarios. We
demonstrate the significant influence of these pitfalls through comprehensive
meta-analyses of recent papers and empirical experiments. To avoid these
pitfalls, we suggest a series of remedies, including specifying data
preprocessing, standardizing outputs, and providing pipeline evaluation
results. To help implement these remedies, we develop a consistent evaluation
framework OMNIEVENT, which can be obtained from
https://github.com/THU-KEG/OmniEvent.
Related papers
- Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models [69.38024658668887]
Current evaluation method for event extraction relies on token-level exact match.
We propose RAEE, an automatic evaluation framework that accurately assesses event extraction results at semantic-level instead of token-level.
arXiv Detail & Related papers (2024-10-12T07:54:01Z) - DAGnosis: Localized Identification of Data Inconsistencies using
Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models.
We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure.
Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z) - Unveiling the Deficiencies of Pre-trained Text-and-Layout Models in Real-world Visually-rich Document Information Extraction [19.083538884467917]
We introduce EC-FUNSD, an entity-centric dataset crafted for benchmarking information extraction from visually-rich documents.
We evaluate the real-world information extraction capabilities of PTLMs from multiple aspects, including their absolute performance, as well as generalization, robustness and fairness.
arXiv Detail & Related papers (2024-02-04T07:33:45Z) - Extracting or Guessing? Improving Faithfulness of Event Temporal
Relation Extraction [87.04153383938969]
We improve the faithfulness of TempRel extraction models from two perspectives.
The first perspective is to extract genuinely based on contextual description.
The second perspective is to provide proper uncertainty estimation.
arXiv Detail & Related papers (2022-10-10T19:53:13Z) - Improve Event Extraction via Self-Training with Gradient Guidance [10.618929821822892]
We propose a Self-Training with Feedback (STF) framework to overcome the main factor that hinders the progress of event extraction.
STF consists of (1) a base event extraction model trained on existing event annotations and then applied to large-scale unlabeled corpora to predict new event mentions as pseudo training samples, and (2) a novel scoring model that takes in each new predicted event trigger, an argument, its argument role, as well as their paths in the AMR graph to estimate a compatibility score.
Experimental results on three benchmark datasets, including ACE05-E, ACE05-E+, and ERE
arXiv Detail & Related papers (2022-05-25T04:40:17Z) - WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD)
An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images.
The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z) - Few-Shot Event Detection with Prototypical Amortized Conditional Random
Field [8.782210889586837]
Event Detection tends to struggle when it needs to recognize novel event types with a few samples.
We present a novel unified joint model which converts the task to a few-shot tagging problem with a double-part tagging scheme.
We conduct experiments on the benchmark dataset FewEvent and the experimental results show that the tagging based methods are better than existing pipeline and joint learning methods.
arXiv Detail & Related papers (2020-12-04T01:11:13Z) - Let's Stop Incorrect Comparisons in End-to-end Relation Extraction! [13.207968737733196]
We first identify several patterns of invalid comparisons in published papers and describe them to avoid their propagation.
We then propose a small empirical study to quantify the impact of the most common mistake and evaluate it leads to overestimating the final RE performance by around 5% on ACE05.
arXiv Detail & Related papers (2020-09-22T16:59:15Z) - Detecting Ongoing Events Using Contextual Word and Sentence Embeddings [110.83289076967895]
This paper introduces the Ongoing Event Detection (OED) task.
The goal is to detect ongoing event mentions only, as opposed to historical, future, hypothetical, or other forms or events that are neither fresh nor current.
Any application that needs to extract structured information about ongoing events from unstructured texts can take advantage of an OED system.
arXiv Detail & Related papers (2020-07-02T20:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.