Related papers: Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED

Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED

URL: http://arxiv.org/abs/2204.07980v1
Date: Sun, 17 Apr 2022 11:29:01 GMT
Title: Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED
Authors: Quzhe Huang, Shibo Hao, Yuan Ye, Shengqi Zhu, Yansong Feng, Dongyan Zhao
Abstract summary: We show that a textit-revise scheme results in false negative samples and an obvious bias towards popular entities and relations. The relabeled dataset is released to serve as a more reliable test set of document RE models.
Score: 60.39125850987604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: DocRED is a widely used dataset for document-level relation extraction. In the large-scale annotation, a \textit{recommend-revise} scheme is adopted to reduce the workload. Within this scheme, annotators are provided with candidate relation instances from distant supervision, and they then manually supplement and remove relational facts based on the recommendations. However, when comparing DocRED with a subset relabeled from scratch, we find that this scheme results in a considerable amount of false negative samples and an obvious bias towards popular entities and relations. Furthermore, we observe that the models trained on DocRED have low recall on our relabeled dataset and inherit the same bias in the training data. Through the analysis of annotators' behaviors, we figure out the underlying reason for the problems above: the scheme actually discourages annotators from supplementing adequate instances in the revision phase. We appeal to future research to take into consideration the issues with the recommend-revise scheme when designing new models and annotation schemes. The relabeled dataset is released at \url{https://github.com/AndrewZhe/Revisit-DocRED}, to serve as a more reliable test set of document RE models.

Related papers

Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence [56.09494651178128]
Retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG) We show that retrievers often rely on superficial patterns like over-prioritizing document beginnings, shorter documents, repeated entities, and literal matches. We show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs.
arXiv Detail & Related papers (2025-03-06T23:23:13Z)
Beyond the Numbers: Transparency in Relation Extraction Benchmark Creation and Leaderboards [5.632231145349045]
This paper investigates the transparency in the creation of benchmarks and the use of leaderboards for measuring progress in NLP. Existing relation extraction benchmarks often suffer from insufficient documentation and lack crucial details. While our discussion centers on the transparency of RE benchmarks and leaderboards, the observations we discuss are broadly applicable to other NLP tasks as well.
arXiv Detail & Related papers (2024-11-07T22:36:19Z)
Consistent Document-Level Relation Extraction via Counterfactuals [47.75615221596254]
It has been shown that document-level relation extraction models trained on real-world data suffer from factual biases. We present CovEReD, a dataset of document-level counterfactual data for document extraction. We show that by generating document-level counterfactual data with CovEReD models on them, consistency is maintained.
arXiv Detail & Related papers (2024-07-09T09:21:55Z)
RaFe: Ranking Feedback Improves Query Rewriting for RAG [83.24385658573198]
We propose a framework for training query rewriting models free of annotations. By leveraging a publicly available reranker, oursprovides feedback aligned well with the rewriting objectives.
arXiv Detail & Related papers (2024-05-23T11:00:19Z)
Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain. We propose an adversarial algorithm to make the retriever component robust against distribution shift. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z)
Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data [43.46328487543664]
Relation extraction (RE) aims to extract relations from sentences and documents. Recent studies showed that many RE datasets are incompletely annotated. This is known as the false negative problem in which valid relations are falsely annotated as 'no_relation'
arXiv Detail & Related papers (2023-06-16T09:01:45Z)
Revisiting DocRED -- Addressing the False Negative Problem in Relation Extraction [39.78594332093083]
We re-annotate 4,053 documents in the DocRED dataset by adding the missed relation triples back to the original DocRED. We conduct extensive experiments with state-of-the-art neural models on both datasets, and the experimental results show that the models trained and evaluated on our Re-DocRED achieve performance improvements of around 13 F1 points.
arXiv Detail & Related papers (2022-05-25T11:54:48Z)
Efficient Few-Shot Fine-Tuning for Opinion Summarization [83.76460801568092]
Abstractive summarization models are typically pre-trained on large amounts of generic texts, then fine-tuned on tens or hundreds of thousands of annotated samples. We show that a few-shot method based on adapters can easily store in-domain knowledge. We show that this self-supervised adapter pre-training improves summary quality over standard fine-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp datasets.
arXiv Detail & Related papers (2022-05-04T16:38:37Z)
Document-Level Relation Extraction with Reconstruction [28.593318203728963]
We propose a novel encoder-classifier-reconstructor model for document-level relation extraction (DocRE) The reconstructor reconstructs the ground-truth path dependencies from the graph representation, to ensure that the proposed DocRE model pays more attention to encode entity pairs with relationships in the training. Experimental results on a large-scale DocRE dataset show that the proposed model can significantly improve the accuracy of relation extraction on a strong heterogeneous graph-based baseline.
arXiv Detail & Related papers (2020-12-21T14:29:31Z)
Evaluating Models' Local Decision Boundaries via Contrast Sets [119.38387782979474]
We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data. We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets. Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets.
arXiv Detail & Related papers (2020-04-06T14:47:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.