Revisiting DocRED -- Addressing the False Negative Problem in Relation
Extraction
- URL: http://arxiv.org/abs/2205.12696v3
- Date: Fri, 16 Jun 2023 05:44:07 GMT
- Title: Revisiting DocRED -- Addressing the False Negative Problem in Relation
Extraction
- Authors: Qingyu Tan, Lu Xu, Lidong Bing, Hwee Tou Ng, Sharifah Mahani Aljunied
- Abstract summary: We re-annotate 4,053 documents in the DocRED dataset by adding the missed relation triples back to the original DocRED.
We conduct extensive experiments with state-of-the-art neural models on both datasets, and the experimental results show that the models trained and evaluated on our Re-DocRED achieve performance improvements of around 13 F1 points.
- Score: 39.78594332093083
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The DocRED dataset is one of the most popular and widely used benchmarks for
document-level relation extraction (RE). It adopts a recommend-revise
annotation scheme so as to have a large-scale annotated dataset. However, we
find that the annotation of DocRED is incomplete, i.e., false negative samples
are prevalent. We analyze the causes and effects of the overwhelming false
negative problem in the DocRED dataset. To address the shortcoming, we
re-annotate 4,053 documents in the DocRED dataset by adding the missed relation
triples back to the original DocRED. We name our revised DocRED dataset
Re-DocRED. We conduct extensive experiments with state-of-the-art neural models
on both datasets, and the experimental results show that the models trained and
evaluated on our Re-DocRED achieve performance improvements of around 13 F1
points. Moreover, we conduct a comprehensive analysis to identify the potential
areas for further improvement. Our dataset is publicly available at
https://github.com/tonytan48/Re-DocRED.
Related papers
- GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction [15.246183329778656]
Document-level relation extraction (DocRE) aims to extract relations between entities from unstructured document text.
To overcome these challenges, we propose GEGA, a novel model for DocRE.
We evaluate the GEGA model on three widely used benchmark datasets: DocRED, Re-DocRED, and Revisit-DocRED.
arXiv Detail & Related papers (2024-07-31T07:15:33Z) - Consistent Document-Level Relation Extraction via Counterfactuals [47.75615221596254]
It has been shown that document-level relation extraction models trained on real-world data suffer from factual biases.
We present CovEReD, a dataset of document-level counterfactual data for document extraction.
We show that by generating document-level counterfactual data with CovEReD models on them, consistency is maintained.
arXiv Detail & Related papers (2024-07-09T09:21:55Z) - Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models.
We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations.
By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z) - Rethinking Document-Level Relation Extraction: A Reality Check [14.59603835395313]
We take a closer look at the field to see if these performance gains are actually true.
We construct four types of entity mention attacks to examine the robustness of typical DocRE models.
Our findings reveal that most of current DocRE models are vulnerable to entity mention attacks and difficult to be deployed in real-world end-user NLP applications.
arXiv Detail & Related papers (2023-06-15T08:47:42Z) - Towards Integration of Discriminability and Robustness for
Document-Level Relation Extraction [41.51148745387936]
Document-level relation extraction (DocRE) predicts relations for entity pairs that rely on long-range context-dependent reasoning in a document.
In this work, we aim to achieve better integration of both the discriminability and robustness for the DocRE problem.
We innovatively customize entropy minimization and supervised contrastive learning for the challenging multi-label and long-tailed learning problems.
arXiv Detail & Related papers (2023-04-03T09:11:18Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z) - Document-Level Relation Extraction with Sentences Importance Estimation
and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences.
We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss.
Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z) - Does Recommend-Revise Produce Reliable Annotations? An Analysis on
Missing Instances in DocRED [60.39125850987604]
We show that a textit-revise scheme results in false negative samples and an obvious bias towards popular entities and relations.
The relabeled dataset is released to serve as a more reliable test set of document RE models.
arXiv Detail & Related papers (2022-04-17T11:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.