Rethinking the Evaluation of Pre-trained Text-and-Layout Models from an
Entity-Centric Perspective
- URL: http://arxiv.org/abs/2402.02379v1
- Date: Sun, 4 Feb 2024 07:33:45 GMT
- Title: Rethinking the Evaluation of Pre-trained Text-and-Layout Models from an
Entity-Centric Perspective
- Authors: Chong Zhang, Yixi Zhao, Chenshu Yuan, Yi Tu, Ya Guo, Qi Zhang
- Abstract summary: EC-FUNSD is an entity-centric benckmark designed for the evaluation of semantic entity recognition and entity linking on visually-rich documents.
This dataset contains diverse formats of document layouts and annotations of semantic-driven entities and their relations.
Experiment results demonstrate that state-of-the-art PTLMs exhibit overfitting tendencies on the prevailing benchmarks, as their performance sharply decrease when the dataset bias is removed.
- Score: 15.222536348615087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently developed pre-trained text-and-layout models (PTLMs) have shown
remarkable success in multiple information extraction tasks on visually-rich
documents. However, the prevailing evaluation pipeline may not be sufficiently
robust for assessing the information extraction ability of PTLMs, due to
inadequate annotations within the benchmarks. Therefore, we claim the necessary
standards for an ideal benchmark to evaluate the information extraction ability
of PTLMs. We then introduce EC-FUNSD, an entity-centric benckmark designed for
the evaluation of semantic entity recognition and entity linking on
visually-rich documents. This dataset contains diverse formats of document
layouts and annotations of semantic-driven entities and their relations.
Moreover, this dataset disentangles the falsely coupled annotation of segment
and entity that arises from the block-level annotation of FUNSD. Experiment
results demonstrate that state-of-the-art PTLMs exhibit overfitting tendencies
on the prevailing benchmarks, as their performance sharply decrease when the
dataset bias is removed.
Related papers
- Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction [85.26780391682894]
We propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE)
FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary.
Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation.
arXiv Detail & Related papers (2024-03-04T17:57:18Z) - Analysis of Multidomain Abstractive Summarization Using Salience
Allocation [2.6880540371111445]
Season is a model designed to enhance summarization by leveraging salience allocation techniques.
This paper employs various evaluation metrics such as ROUGE, METEOR, BERTScore, and MoverScore to evaluate the performance of these models fine-tuned for generating abstractive summaries.
arXiv Detail & Related papers (2024-02-19T08:52:12Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Document-Level Relation Extraction with Sentences Importance Estimation
and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences.
We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss.
Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z) - Does Recommend-Revise Produce Reliable Annotations? An Analysis on
Missing Instances in DocRED [60.39125850987604]
We show that a textit-revise scheme results in false negative samples and an obvious bias towards popular entities and relations.
The relabeled dataset is released to serve as a more reliable test set of document RE models.
arXiv Detail & Related papers (2022-04-17T11:29:01Z) - Entity-level Factual Consistency of Abstractive Text Summarization [26.19686599842915]
Key challenge for abstractive summarization is ensuring factual consistency of the generated summary with respect to the original document.
We propose a set of new metrics to quantify the entity-level factual consistency of generated summaries.
arXiv Detail & Related papers (2021-02-18T03:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.