GENEVA: Benchmarking Generalizability for Event Argument Extraction with
Hundreds of Event Types and Argument Roles
- URL: http://arxiv.org/abs/2205.12505v5
- Date: Thu, 1 Jun 2023 06:14:24 GMT
- Title: GENEVA: Benchmarking Generalizability for Event Argument Extraction with
Hundreds of Event Types and Argument Roles
- Authors: Tanmay Parekh, I-Hung Hsu, Kuan-Hao Huang, Kai-Wei Chang, Nanyun Peng
- Abstract summary: Event Argument Extraction (EAE) has focused on improving model generalizability to cater to new events and domains.
Standard benchmarking datasets like ACE and ERE cover less than 40 event types and 25 entity-centric argument roles.
- Score: 77.05288144035056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works in Event Argument Extraction (EAE) have focused on improving
model generalizability to cater to new events and domains. However, standard
benchmarking datasets like ACE and ERE cover less than 40 event types and 25
entity-centric argument roles. Limited diversity and coverage hinder these
datasets from adequately evaluating the generalizability of EAE models. In this
paper, we first contribute by creating a large and diverse EAE ontology. This
ontology is created by transforming FrameNet, a comprehensive semantic role
labeling (SRL) dataset for EAE, by exploiting the similarity between these two
tasks. Then, exhaustive human expert annotations are collected to build the
ontology, concluding with 115 events and 220 argument roles, with a significant
portion of roles not being entities. We utilize this ontology to further
introduce GENEVA, a diverse generalizability benchmarking dataset comprising
four test suites, aimed at evaluating models' ability to handle limited data
and unseen event type generalization. We benchmark six EAE models from various
families. The results show that owing to non-entity argument roles, even the
best-performing model can only achieve 39% F1 score, indicating how GENEVA
provides new challenges for generalization in EAE. Overall, our large and
diverse EAE ontology can aid in creating more comprehensive future resources,
while GENEVA is a challenging benchmarking dataset encouraging further research
for improving generalizability in EAE. The code and data can be found at
https://github.com/PlusLabNLP/GENEVA.
Related papers
- A Structure-aware Generative Model for Biomedical Event Extraction [6.282854894433099]
Event structure-aware generative model named GenBEE can capture complex event structures in biomedical text.
We have evaluated the proposed GenBEE model on three widely used biomedical event extraction benchmark datasets.
arXiv Detail & Related papers (2024-08-13T02:43:19Z) - UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models [88.16197692794707]
UniGen is a comprehensive framework designed to produce diverse, accurate, and highly controllable datasets.
To augment data diversity, UniGen incorporates an attribute-guided generation module and a group checking feature.
Extensive experiments demonstrate the superior quality of data generated by UniGen.
arXiv Detail & Related papers (2024-06-27T07:56:44Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation [104.6065882758648]
MAVEN-Arg is the first all-in-one dataset supporting event detection, event argument extraction, and event relation extraction.
As an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive schema covering 162 event types and 612 argument roles, all with expert-written definitions and examples; (2) a large data scale, containing 98,591 events and 290,613 arguments obtained with laborious human annotation; and (3) the exhaustive annotation supporting all task variants of EAE.
arXiv Detail & Related papers (2023-11-15T16:52:14Z) - AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction
Model [38.390078345679214]
Event argument extraction (EAE) identifies event arguments and their specific roles for a given event.
Recent advancement in generation-based EAE models has shown great performance and generalizability over classification-based models.
We propose AMPERE, which generates AMR-aware prefixes for every layer of the generation model.
arXiv Detail & Related papers (2023-05-26T08:38:25Z) - Novel Human-Object Interaction Detection via Adversarial Domain
Generalization [103.55143362926388]
We study the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios.
The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations.
We propose a unified framework of adversarial domain generalization to learn object-invariant features for predicate prediction.
arXiv Detail & Related papers (2020-05-22T22:02:56Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.