Distantly Supervised Morpho-Syntactic Model for Relation Extraction
- URL: http://arxiv.org/abs/2401.10002v1
- Date: Thu, 18 Jan 2024 14:17:40 GMT
- Title: Distantly Supervised Morpho-Syntactic Model for Relation Extraction
- Authors: Nicolas Gutehrl\'e, Iana Atanassova
- Abstract summary: We present a method for the extraction and categorisation of an unrestricted set of relationships from text.
We evaluate our approach on six datasets built on Wikidata and Wikipedia.
- Score: 0.27195102129094995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of Information Extraction (IE) involves automatically converting
unstructured textual content into structured data. Most research in this field
concentrates on extracting all facts or a specific set of relationships from
documents. In this paper, we present a method for the extraction and
categorisation of an unrestricted set of relationships from text. Our method
relies on morpho-syntactic extraction patterns obtained by a distant
supervision method, and creates Syntactic and Semantic Indices to extract and
classify candidate graphs. We evaluate our approach on six datasets built on
Wikidata and Wikipedia. The evaluation shows that our approach can achieve
Precision scores of up to 0.85, but with lower Recall and F1 scores. Our
approach allows to quickly create rule-based systems for Information Extraction
and to build annotated datasets to train machine-learning and deep-learning
based classifiers.
Related papers
- FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions [4.961045761391367]
Reading comprehension models answer questions posed in natural language when provided with a short passage of text.
We introduce a new model, Relation Coherence, that exploits knowledge of the relational structure to improve the extraction quality.
We demonstrate on two datasets that Relation Coherence boosts extraction performance and evaluate FabricQA-Extractor on large scale datasets.
arXiv Detail & Related papers (2024-08-17T15:16:54Z) - Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - ImPaKT: A Dataset for Open-Schema Knowledge Base Construction [10.073210304061966]
ImPaKT is a dataset for open-schema information extraction consisting of around 2500 text snippets from the C4 corpus, in the shopping domain (product buying guides)
We evaluate the power of this approach by fine-tuning the open source UL2 language model on a subset of the dataset, extracting a set of implication relations from a corpus of product buying guides, and conducting human evaluations of the resulting predictions.
arXiv Detail & Related papers (2022-12-21T05:02:49Z) - ReSel: N-ary Relation Extraction from Scientific Text and Tables by
Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles.
Our proposed method ReSel decomposes this task into a two-stage procedure.
Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - AIFB-WebScience at SemEval-2022 Task 12: Relation Extraction First --
Using Relation Extraction to Identify Entities [0.0]
We present an end-to-end joint entity and relation extraction approach based on transformer-based language models.
In contrast to existing approaches, which perform entity and relation extraction in sequence, our system incorporates information from relation extraction into entity extraction.
arXiv Detail & Related papers (2022-03-10T12:19:44Z) - Integrating Semantics and Neighborhood Information with Graph-Driven
Generative Models for Document Retrieval [51.823187647843945]
In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model.
Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones.
arXiv Detail & Related papers (2021-05-27T11:29:03Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.