Related papers: The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation

The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation

URL: http://arxiv.org/abs/2306.06918v2
Date: Thu, 15 Jun 2023 07:23:57 GMT
Title: The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation
Authors: Hao Peng, Xiaozhi Wang, Feng Yao, Kaisheng Zeng, Lei Hou, Juanzi Li, Zhiyuan Liu, Weixing Shen
Abstract summary: Event extraction (EE) is a crucial task aiming at extracting events from texts. In this paper, we check the reliability of EE evaluations and identify three major pitfalls.
Score: 46.09190731105998
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Event extraction (EE) is a crucial task aiming at extracting events from texts, which includes two subtasks: event detection (ED) and event argument extraction (EAE). In this paper, we check the reliability of EE evaluations and identify three major pitfalls: (1) The data preprocessing discrepancy makes the evaluation results on the same dataset not directly comparable, but the data preprocessing details are not widely noted and specified in papers. (2) The output space discrepancy of different model paradigms makes different-paradigm EE models lack grounds for comparison and also leads to unclear mapping issues between predictions and annotations. (3) The absence of pipeline evaluation of many EAE-only works makes them hard to be directly compared with EE works and may not well reflect the model performance in real-world pipeline scenarios. We demonstrate the significant influence of these pitfalls through comprehensive meta-analyses of recent papers and empirical experiments. To avoid these pitfalls, we suggest a series of remedies, including specifying data preprocessing, standardizing outputs, and providing pipeline evaluation results. To help implement these remedies, we develop a consistent evaluation framework OMNIEVENT, which can be obtained from https://github.com/THU-KEG/OmniEvent.

Related papers

Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models [69.38024658668887]
Current evaluation method for event extraction relies on token-level exact match. We propose RAEE, an automatic evaluation framework that accurately assesses event extraction results at semantic-level instead of token-level.
arXiv Detail & Related papers (2024-10-12T07:54:01Z)
DAGnosis: Localized Identification of Data Inconsistencies using Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models. We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure. Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z)
Unveiling the Deficiencies of Pre-trained Text-and-Layout Models in Real-world Visually-rich Document Information Extraction [19.083538884467917]
We introduce EC-FUNSD, an entity-centric dataset crafted for benchmarking information extraction from visually-rich documents. We evaluate the real-world information extraction capabilities of PTLMs from multiple aspects, including their absolute performance, as well as generalization, robustness and fairness.
arXiv Detail & Related papers (2024-02-04T07:33:45Z)
Extracting or Guessing? Improving Faithfulness of Event Temporal Relation Extraction [87.04153383938969]
We improve the faithfulness of TempRel extraction models from two perspectives. The first perspective is to extract genuinely based on contextual description. The second perspective is to provide proper uncertainty estimation.
arXiv Detail & Related papers (2022-10-10T19:53:13Z)
Improve Event Extraction via Self-Training with Gradient Guidance [10.618929821822892]
We propose a Self-Training with Feedback (STF) framework to overcome the main factor that hinders the progress of event extraction. STF consists of (1) a base event extraction model trained on existing event annotations and then applied to large-scale unlabeled corpora to predict new event mentions as pseudo training samples, and (2) a novel scoring model that takes in each new predicted event trigger, an argument, its argument role, as well as their paths in the AMR graph to estimate a compatibility score. Experimental results on three benchmark datasets, including ACE05-E, ACE05-E+, and ERE
arXiv Detail & Related papers (2022-05-25T04:40:17Z)
WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD) An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images. The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z)
Few-Shot Event Detection with Prototypical Amortized Conditional Random Field [8.782210889586837]
Event Detection tends to struggle when it needs to recognize novel event types with a few samples. We present a novel unified joint model which converts the task to a few-shot tagging problem with a double-part tagging scheme. We conduct experiments on the benchmark dataset FewEvent and the experimental results show that the tagging based methods are better than existing pipeline and joint learning methods.
arXiv Detail & Related papers (2020-12-04T01:11:13Z)
Let's Stop Incorrect Comparisons in End-to-end Relation Extraction! [13.207968737733196]
We first identify several patterns of invalid comparisons in published papers and describe them to avoid their propagation. We then propose a small empirical study to quantify the impact of the most common mistake and evaluate it leads to overestimating the final RE performance by around 5% on ACE05.
arXiv Detail & Related papers (2020-09-22T16:59:15Z)
Detecting Ongoing Events Using Contextual Word and Sentence Embeddings [110.83289076967895]
This paper introduces the Ongoing Event Detection (OED) task. The goal is to detect ongoing event mentions only, as opposed to historical, future, hypothetical, or other forms or events that are neither fresh nor current. Any application that needs to extract structured information about ongoing events from unstructured texts can take advantage of an OED system.
arXiv Detail & Related papers (2020-07-02T20:44:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.