Related papers: ImPaKT: A Dataset for Open-Schema Knowledge Base Construction

ImPaKT: A Dataset for Open-Schema Knowledge Base Construction

URL: http://arxiv.org/abs/2212.10770v1
Date: Wed, 21 Dec 2022 05:02:49 GMT
Title: ImPaKT: A Dataset for Open-Schema Knowledge Base Construction
Authors: Luke Vilnis, Zach Fisher, Bhargav Kanagal, Patrick Murray, Sumit Sanghai
Abstract summary: ImPaKT is a dataset for open-schema information extraction consisting of around 2500 text snippets from the C4 corpus, in the shopping domain (product buying guides) We evaluate the power of this approach by fine-tuning the open source UL2 language model on a subset of the dataset, extracting a set of implication relations from a corpus of product buying guides, and conducting human evaluations of the resulting predictions.
Score: 10.073210304061966
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models have ushered in a golden age of semantic parsing. The seq2seq paradigm allows for open-schema and abstractive attribute and relation extraction given only small amounts of finetuning data. Language model pretraining has simultaneously enabled great strides in natural language inference, reasoning about entailment and implication in free text. These advances motivate us to construct ImPaKT, a dataset for open-schema information extraction, consisting of around 2500 text snippets from the C4 corpus, in the shopping domain (product buying guides), professionally annotated with extracted attributes, types, attribute summaries (attribute schema discovery from idiosyncratic text), many-to-one relations between compound and atomic attributes, and implication relations. We release this data in hope that it will be useful in fine tuning semantic parsers for information extraction and knowledge base construction across a variety of domains. We evaluate the power of this approach by fine-tuning the open source UL2 language model on a subset of the dataset, extracting a set of implication relations from a corpus of product buying guides, and conducting human evaluations of the resulting predictions.

Related papers

StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation [8.251302684712773]
StructText is an end-to-end framework for automatically generating high-fidelity benchmarks for key-value extraction from text.<n>We evaluate the proposed method on 71,539 examples across 49 documents.
arXiv Detail & Related papers (2025-07-28T21:20:44Z)
FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions [4.961045761391367]
Reading comprehension models answer questions posed in natural language when provided with a short passage of text. We introduce a new model, Relation Coherence, that exploits knowledge of the relational structure to improve the extraction quality. We demonstrate on two datasets that Relation Coherence boosts extraction performance and evaluate FabricQA-Extractor on large scale datasets.
arXiv Detail & Related papers (2024-08-17T15:16:54Z)
Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction. We reformulate the task to be entity-centric, enabling the use of diverse metrics. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z)
Distantly Supervised Morpho-Syntactic Model for Relation Extraction [0.27195102129094995]
We present a method for the extraction and categorisation of an unrestricted set of relationships from text. We evaluate our approach on six datasets built on Wikidata and Wikipedia.
arXiv Detail & Related papers (2024-01-18T14:17:40Z)
Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models [26.523153535336725]
Document-level Relation Extraction (DocRE) aims to extract relations from a long context. We propose a method integrating a large language model (LLM) and a natural language inference (NLI) module to generate relation triples. We demonstrate the effectiveness of our approach by introducing an enhanced dataset known as DocGNRE.
arXiv Detail & Related papers (2023-11-13T13:10:44Z)
Instruct and Extract: Instruction Tuning for On-Demand Information Extraction [86.29491354355356]
On-Demand Information Extraction aims to fulfill the personalized demands of real-world users. We present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE.
arXiv Detail & Related papers (2023-10-24T17:54:25Z)
Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE) In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE. Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z)
Syntactic Multi-view Learning for Open Information Extraction [26.1066324477346]
Open Information Extraction (OpenIE) aims to extracts from open-domain sentences. In this paper, we model both constituency and dependency trees into word-level graphs.
arXiv Detail & Related papers (2022-12-05T07:15:41Z)
Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction. RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z)
Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text. Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z)
Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts. We learn relation prototypes as an implicit factor between entities. We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.