Knowledge-Driven Cross-Document Relation Extraction
- URL: http://arxiv.org/abs/2405.13546v2
- Date: Tue, 18 Jun 2024 08:20:19 GMT
- Title: Knowledge-Driven Cross-Document Relation Extraction
- Authors: Monika Jain, Raghava Mutharaju, Kuldeep Singh, Ramakanth Kavuluru,
- Abstract summary: Relation extraction (RE) is a well-known NLP application often treated as a sentence- or document-level task.
We propose a novel approach, KXDocRE, that embed domain knowledge of entities with input text for cross-document RE.
- Score: 3.868708275322908
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Relation extraction (RE) is a well-known NLP application often treated as a sentence- or document-level task. However, a handful of recent efforts explore it across documents or in the cross-document setting (CrossDocRE). This is distinct from the single document case because different documents often focus on disparate themes, while text within a document tends to have a single goal. Linking findings from disparate documents to identify new relationships is at the core of the popular literature-based knowledge discovery paradigm in biomedicine and other domains. Current CrossDocRE efforts do not consider domain knowledge, which are often assumed to be known to the reader when documents are authored. Here, we propose a novel approach, KXDocRE, that embed domain knowledge of entities with input text for cross-document RE. Our proposed framework has three main benefits over baselines: 1) it incorporates domain knowledge of entities along with documents' text; 2) it offers interpretability by producing explanatory text for predicted relations between entities 3) it improves performance over the prior methods.
Related papers
- Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time.
Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z) - Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection.
We abstract over arbitrary header paraphrases, and ground each topic to respective document locations.
We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z) - TransDocAnalyser: A Framework for Offline Semi-structured Handwritten
Document Analysis in the Legal Domain [3.5018563401895455]
We build the first semi-structured document analysis dataset in the legal domain.
This dataset combines a wide variety of handwritten text with printed text.
We propose an end-to-end framework for offline processing of handwritten semi-structured documents.
arXiv Detail & Related papers (2023-06-03T15:56:30Z) - Entity-centered Cross-document Relation Extraction [34.38369224008656]
Relation Extraction (RE) is a fundamental task of information extraction, which has attracted a large amount of research attention.
Previous studies focus on extracting the relations within a sentence or document, while currently researchers begin to explore cross-document RE.
In this paper, we aim to address both of these shortages and push the state-of-the-art for cross-document RE.
arXiv Detail & Related papers (2022-10-29T09:27:15Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Coherence-Based Distributed Document Representation Learning for
Scientific Documents [9.646001537050925]
We propose a coupled text pair embedding (CTPE) model to learn the representation of scientific documents.
We use negative sampling to construct uncoupled text pairs whose two parts are from different documents.
We train the model to judge whether the text pair is coupled or uncoupled and use the obtained embedding of coupled text pairs as the embedding of documents.
arXiv Detail & Related papers (2022-01-08T15:29:21Z) - Evaluation of a Region Proposal Architecture for Multi-task Document
Layout Analysis [0.685316573653194]
Mask-RCNN architecture is designed to address the problem of baseline detection and region segmentation.
We present experimental results on two handwritten text datasets and one handwritten music dataset.
The analyzed architecture yields promising results, outperforming state-of-the-art techniques in all three datasets.
arXiv Detail & Related papers (2021-06-22T14:07:27Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - Pairwise Multi-Class Document Classification for Semantic Relations
between Wikipedia Articles [5.40541521227338]
We model the problem of finding the relationship between two documents as a pairwise document classification task.
To find semantic relation between documents, we apply a series of techniques, such as GloVe, paragraph-s, BERT, and XLNet.
We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations.
arXiv Detail & Related papers (2020-03-22T12:52:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.