GenIE: Generative Information Extraction
- URL: http://arxiv.org/abs/2112.08340v1
- Date: Wed, 15 Dec 2021 18:45:14 GMT
- Title: GenIE: Generative Information Extraction
- Authors: Martin Josifoski, Nicola De Cao, Maxime Peyrard, Robert West
- Abstract summary: We introduce GenIE, the first end-to-end autoregressive formulation of closed information extraction.
Our experiments show that GenIE is state-of-the-art on closed information extraction.
This work paves the way towards a unified end-to-end approach to the core tasks of information extraction.
- Score: 20.491645841368214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structured and grounded representation of text is typically formalized by
closed information extraction, the problem of extracting an exhaustive set of
(subject, relation, object) triplets that are consistent with a predefined set
of entities and relations from a knowledge base schema. Most existing works are
pipelines prone to error accumulation, and all approaches are only applicable
to unrealistically small numbers of entities and relations. We introduce GenIE
(generative information extraction), the first end-to-end autoregressive
formulation of closed information extraction. GenIE naturally exploits the
language knowledge from the pre-trained transformer by autoregressively
generating relations and entities in textual form. Thanks to a new bi-level
constrained generation strategy, only triplets consistent with the predefined
knowledge base schema are produced. Our experiments show that GenIE is
state-of-the-art on closed information extraction, generalizes from fewer
training data points than baselines, and scales to a previously unmanageable
number of entities and relations. With this work, closed information extraction
becomes practical in realistic scenarios, providing new opportunities for
downstream tasks. Finally, this work paves the way towards a unified end-to-end
approach to the core tasks of information extraction. Code and models available
at https://github.com/epfl-dlab/GenIE.
Related papers
- Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs [0.017283310584905027]
We present the first resource that provides a workflow for extracting datasets including both schema and ground facts.<n>The resulting curated suite of datasets is ready for machine learning and reasoning services.<n>We provide utilities for loading datasets in tensor representations typical of standard machine learning libraries.
arXiv Detail & Related papers (2026-02-16T14:42:14Z) - From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence [91.54446789584826]
Epiplexity is a formalization of information capturing what computationally bounded observers can learn from data.<n>We show how information can be created with computation, how it depends on the ordering of the data, and how likelihood modeling can produce more complex programs than present in the data generating process itself.
arXiv Detail & Related papers (2026-01-06T18:04:03Z) - GenIC: An LLM-Based Framework for Instance Completion in Knowledge Graphs [0.0]
We introduce GenIC: a two-step Generative Instance Completion framework.<n>The first step focuses on property prediction, treated as a multi-label classification task.<n>The second step is link prediction, framed as a generative sequence-to-sequence task.
arXiv Detail & Related papers (2025-05-29T22:15:25Z) - Inference over Unseen Entities, Relations and Literals on Knowledge Graphs [1.7474352892977463]
knowledge graph embedding models have been successfully applied in the transductive setting to tackle various challenging tasks.
We propose the attentive byte-pair encoding layer (BytE) to construct a triple embedding from a sequence of byte-pair encoded subword units of entities and relations.
BytE leads to massive feature reuse via weight tying, since it forces a knowledge graph embedding model to learn embeddings for subword units instead of entities and relations directly.
arXiv Detail & Related papers (2024-10-09T10:20:54Z) - Distantly Supervised Morpho-Syntactic Model for Relation Extraction [0.27195102129094995]
We present a method for the extraction and categorisation of an unrestricted set of relationships from text.
We evaluate our approach on six datasets built on Wikidata and Wikipedia.
arXiv Detail & Related papers (2024-01-18T14:17:40Z) - ASPER: Answer Set Programming Enhanced Neural Network Models for Joint
Entity-Relation Extraction [11.049915720093242]
This paper proposes a new approach, ASP-enhanced Entity-Relation extraction (ASPER)
ASPER jointly recognizes entities and relations by learning from both data and domain knowledge.
In particular, ASPER takes advantage of the factual knowledge (represented as facts in ASP) and derived knowledge (represented as rules in ASP) in the learning process of neural network models.
arXiv Detail & Related papers (2023-05-24T17:32:58Z) - Generative Meta-Learning for Zero-Shot Relation Triplet Extraction [20.556880137419064]
Zero-shot Relation Triplet Extraction (ZeroRTE) aims to extract relation triplets from texts containing unseen relation types.
Existing approaches typically leverage the knowledge embedded in pre-trained language models to accomplish the generalization process.
We propose a generative meta-learning framework which exploits the learning-to-learn' ability of meta-learning to boost the generalization capability of generative models.
arXiv Detail & Related papers (2023-05-03T06:34:39Z) - Zero-shot Triplet Extraction by Template Infilling [13.295751492744081]
Triplet extraction aims to extract pairs of entities and their corresponding relations from unstructured text.
We show that by reducing triplet extraction to a template infilling task over a pre-trained language model (LM), we can equip the extraction model with zero-shot learning capabilities.
We propose a novel framework, ZETT, that aligns the task objective to the pre-training objective of generative transformers to generalize to unseen relations.
arXiv Detail & Related papers (2022-12-21T00:57:24Z) - Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task.
The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages.
Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - RelationPrompt: Leveraging Prompts to Generate Synthetic Data for
Zero-Shot Relation Triplet Extraction [65.4337085607711]
We introduce the task setting of Zero-Shot Relation Triplet Extraction (ZeroRTE)
Given an input sentence, each extracted triplet consists of the head entity, relation label, and tail entity where the relation label is not seen at the training stage.
We propose to synthesize relation examples by prompting language models to generate structured texts.
arXiv Detail & Related papers (2022-03-17T05:55:14Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z) - Generative Adversarial Zero-Shot Relational Learning for Knowledge
Graphs [96.73259297063619]
We consider a novel formulation, zero-shot learning, to free this cumbersome curation.
For newly-added relations, we attempt to learn their semantic features from their text descriptions.
We leverage Generative Adrial Networks (GANs) to establish the connection between text and knowledge graph domain.
arXiv Detail & Related papers (2020-01-08T01:19:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.