Rethinking Document-Level Relation Extraction: A Reality Check
- URL: http://arxiv.org/abs/2306.08953v1
- Date: Thu, 15 Jun 2023 08:47:42 GMT
- Title: Rethinking Document-Level Relation Extraction: A Reality Check
- Authors: Jing Li, Yequan Wang, Shuai Zhang, Min Zhang
- Abstract summary: We take a closer look at the field to see if these performance gains are actually true.
We construct four types of entity mention attacks to examine the robustness of typical DocRE models.
Our findings reveal that most of current DocRE models are vulnerable to entity mention attacks and difficult to be deployed in real-world end-user NLP applications.
- Score: 14.59603835395313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, numerous efforts have continued to push up performance boundaries
of document-level relation extraction (DocRE) and have claimed significant
progress in DocRE. In this paper, we do not aim at proposing a novel model for
DocRE. Instead, we take a closer look at the field to see if these performance
gains are actually true. By taking a comprehensive literature review and a
thorough examination of popular DocRE datasets, we find that these performance
gains are achieved upon a strong or even untenable assumption in common: all
named entities are perfectly localized, normalized, and typed in advance. Next,
we construct four types of entity mention attacks to examine the robustness of
typical DocRE models by behavioral probing. We also have a close check on model
usability in a more realistic setting. Our findings reveal that most of current
DocRE models are vulnerable to entity mention attacks and difficult to be
deployed in real-world end-user NLP applications. Our study calls more
attentions for future research to stop simplifying problem setups, and to model
DocRE in the wild rather than in an unrealistic Utopian world.
Related papers
- GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction [15.246183329778656]
Document-level relation extraction (DocRE) aims to extract relations between entities from unstructured document text.
To overcome these challenges, we propose GEGA, a novel model for DocRE.
We evaluate the GEGA model on three widely used benchmark datasets: DocRED, Re-DocRED, and Revisit-DocRED.
arXiv Detail & Related papers (2024-07-31T07:15:33Z) - Efficient Document Ranking with Learnable Late Interactions [73.41976017860006]
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval.
To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings.
Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer.
arXiv Detail & Related papers (2024-06-25T22:50:48Z) - AutoRE: Document-Level Relation Extraction with Large Language Models [27.426703757501507]
We introduce AutoRE, an end-to-end DocRE model that adopts a novel RE extraction paradigm named RHF (Relation-Head-Facts)
Unlike existing approaches, AutoRE does not rely on the assumption of known relation options, making it more reflective of real-world scenarios.
Our experiments on the RE-DocRED dataset showcase AutoRE's best performance, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-03-21T23:48:21Z) - Did the Models Understand Documents? Benchmarking Models for Language
Understanding in Document-Level Relation Extraction [2.4665182280122577]
Document-level relation extraction (DocRE) attracts more research interest recently.
While models achieve consistent performance gains in DocRE, their underlying decision rules are still understudied.
In this paper, we take the first step toward answering this question and then introduce a new perspective on comprehensively evaluating a model.
arXiv Detail & Related papers (2023-06-20T08:52:05Z) - Towards Integration of Discriminability and Robustness for
Document-Level Relation Extraction [41.51148745387936]
Document-level relation extraction (DocRE) predicts relations for entity pairs that rely on long-range context-dependent reasoning in a document.
In this work, we aim to achieve better integration of both the discriminability and robustness for the DocRE problem.
We innovatively customize entropy minimization and supervised contrastive learning for the challenging multi-label and long-tailed learning problems.
arXiv Detail & Related papers (2023-04-03T09:11:18Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Revisiting DocRED -- Addressing the False Negative Problem in Relation
Extraction [39.78594332093083]
We re-annotate 4,053 documents in the DocRED dataset by adding the missed relation triples back to the original DocRED.
We conduct extensive experiments with state-of-the-art neural models on both datasets, and the experimental results show that the models trained and evaluated on our Re-DocRED achieve performance improvements of around 13 F1 points.
arXiv Detail & Related papers (2022-05-25T11:54:48Z) - Document-Level Relation Extraction with Sentences Importance Estimation
and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences.
We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss.
Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z) - Does Recommend-Revise Produce Reliable Annotations? An Analysis on
Missing Instances in DocRED [60.39125850987604]
We show that a textit-revise scheme results in false negative samples and an obvious bias towards popular entities and relations.
The relabeled dataset is released to serve as a more reliable test set of document RE models.
arXiv Detail & Related papers (2022-04-17T11:29:01Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.