Related papers: Document-level Entity-based Extraction as Template Generation

Document-level Entity-based Extraction as Template Generation

URL: http://arxiv.org/abs/2109.04901v1
Date: Fri, 10 Sep 2021 14:18:22 GMT
Title: Document-level Entity-based Extraction as Template Generation
Authors: Kung-Hsiang Huang, Sam Tang and Nanyun Peng
Abstract summary: We propose a generative framework for two document-level EE tasks: role-filler entity extraction (REE) and relation extraction (RE) We first formulate them as a template generation problem, allowing models to efficiently capture cross-entity dependencies. A novel cross-attention guided copy mechanism, TopK Copy, is incorporated into a pre-trained sequence-to-sequence model to enhance the capabilities of identifying key information.
Score: 13.110360825201044
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Document-level entity-based extraction (EE), aiming at extracting entity-centric information such as entity roles and entity relations, is key to automatic knowledge acquisition from text corpora for various domains. Most document-level EE systems build extractive models, which struggle to model long-term dependencies among entities at the document level. To address this issue, we propose a generative framework for two document-level EE tasks: role-filler entity extraction (REE) and relation extraction (RE). We first formulate them as a template generation problem, allowing models to efficiently capture cross-entity dependencies, exploit label semantics, and avoid the exponential computation complexity of identifying N-ary relations. A novel cross-attention guided copy mechanism, TopK Copy, is incorporated into a pre-trained sequence-to-sequence model to enhance the capabilities of identifying key information in the input document. Experiments done on the MUC-4 and SciREX dataset show new state-of-the-art results on REE (+3.26%), binary RE (+4.8%), and 4-ary RE (+2.7%) in F1 score.

Related papers

ABCD-LINK: Annotation Bootstrapping for Cross-Document Fine-Grained Links [57.514511353084565]
We introduce a new domain-agnostic framework for selecting a best-performing approach and annotating cross-document links.<n>We apply our framework in two distinct domains -- peer review and news.<n>The resulting novel datasets lay foundation for numerous cross-document tasks like media framing and peer review.
arXiv Detail & Related papers (2025-09-01T11:32:24Z)
VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction [9.516897428263146]
Document-level Relation Extraction (DocRE) aims to identify relationships between entity pairs within a document. Most existing methods assume a uniform label distribution, resulting in suboptimal performance on real-world, imbalanced datasets. We propose a novel data augmentation approach using generative models to enhance data from the embedding space.
arXiv Detail & Related papers (2024-12-18T04:55:29Z)
Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$) GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training. Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z)
Hypertext Entity Extraction in Webpage [112.56734676713721]
We introduce a textbfMoE-based textbfEntity textbfExtraction textbfFramework (textitMoEEF), which integrates multiple features to enhance model performance. We also analyze the effectiveness of hypertext features in textitHEED and several model components in textitMoEEF.
arXiv Detail & Related papers (2024-03-04T03:21:40Z)
Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction. We reformulate the task to be entity-centric, enabling the use of diverse metrics. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z)
Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction [43.50683283748675]
Document-level Relation Triplet Extraction (DocRTE) is a fundamental task in information systems that aims to simultaneously extract entities with semantic relations from a document. Existing methods heavily rely on a substantial amount of fully labeled data. Recent advanced Large Language Models (LLMs), such as ChatGPT and LLaMA, exhibit impressive long-text generation capabilities.
arXiv Detail & Related papers (2024-01-24T17:04:28Z)
ReSel: N-ary Relation Extraction from Scientific Text and Tables by Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles. Our proposed method ReSel decomposes this task into a two-stage procedure. Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z)
Iterative Document-level Information Extraction via Imitation Learning [32.012467653148846]
We present a novel iterative extraction model, IterX, for extracting complex relations. Our imitation learning approach casts the problem as a Markov decision process (MDP) It leads to state-of-the-art results on two established benchmarks.
arXiv Detail & Related papers (2022-10-12T21:46:04Z)
A Masked Image Reconstruction Network for Document-level Relation Extraction [3.276435438007766]
Document-level relation extraction requires inference over multiple sentences to extract complex relational triples. We propose a novel Document-level Relation Extraction model based on a Masked Image Reconstruction network (DRE-MIR) We evaluate our model on three public document-level relation extraction datasets.
arXiv Detail & Related papers (2022-04-21T02:41:21Z)
Automatically Generating Counterfactuals for Relation Exaction [18.740447044960796]
relation extraction (RE) is a fundamental task in natural language processing. Current deep neural models have achieved high accuracy but are easily affected by spurious correlations. We develop a novel approach to derive contextual counterfactuals for entities.
arXiv Detail & Related papers (2022-02-22T04:46:10Z)
Pack Together: Entity and Relation Extraction with Levitated Marker [61.232174424421025]
We propose a novel span representation approach, named Packed Levitated Markers, to consider the dependencies between the spans (pairs) by strategically packing the markers in the encoder. Our experiments show that our model with packed levitated markers outperforms the sequence labeling model by 0.4%-1.9% F1 on three flat NER tasks, and beats the token concat model on six NER benchmarks.
arXiv Detail & Related papers (2021-09-13T15:38:13Z)
Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization [131.23966358405767]
We adapt TP-TRANSFORMER with the explicitly compositional Product Representation (TPR) for the task of abstractive summarization. Key feature of our model is a structural bias that we introduce by encoding two separate representations for each token. We show that our TP-TRANSFORMER outperforms the Transformer and the original TP-TRANSFORMER significantly on several abstractive summarization datasets.
arXiv Detail & Related papers (2021-06-02T17:32:33Z)
Data Augmentation for Abstractive Query-Focused Multi-Document Summarization [129.96147867496205]
We present two QMDS training datasets, which we construct using two data augmentation methods. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. We build end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets.
arXiv Detail & Related papers (2021-03-02T16:57:01Z)
Reasoning with Latent Structure Refinement for Document-Level Relation Extraction [20.308845516900426]
We propose a novel model that empowers the relational reasoning across sentences by automatically inducing the latent document-level graph. Specifically, our model achieves an F1 score of 59.05 on a large-scale document-level dataset (DocRED)
arXiv Detail & Related papers (2020-05-13T13:36:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.