FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions
- URL: http://arxiv.org/abs/2408.09226v1
- Date: Sat, 17 Aug 2024 15:16:54 GMT
- Title: FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions
- Authors: Qiming Wang, Raul Castro Fernandez,
- Abstract summary: Reading comprehension models answer questions posed in natural language when provided with a short passage of text.
We introduce a new model, Relation Coherence, that exploits knowledge of the relational structure to improve the extraction quality.
We demonstrate on two datasets that Relation Coherence boosts extraction performance and evaluate FabricQA-Extractor on large scale datasets.
- Score: 4.961045761391367
- License:
- Abstract: Reading comprehension models answer questions posed in natural language when provided with a short passage of text. They present an opportunity to address a long-standing challenge in data management: the extraction of structured data from unstructured text. Consequently, several approaches are using these models to perform information extraction. However, these modern approaches leave an opportunity behind because they do not exploit the relational structure of the target extraction table. In this paper, we introduce a new model, Relation Coherence, that exploits knowledge of the relational structure to improve the extraction quality. We incorporate the Relation Coherence model as part of FabricQA-Extractor, an end-to-end system we built from scratch to conduct large scale extraction tasks over millions of documents. We demonstrate on two datasets with millions of passages that Relation Coherence boosts extraction performance and evaluate FabricQA-Extractor on large scale datasets.
Related papers
- Maximizing Relation Extraction Potential: A Data-Centric Study to Unveil Challenges and Opportunities [3.8087810875611896]
This paper investigates the possible data-centric characteristics that impede neural relation extraction.
It emphasizes pivotal issues, such as contextual ambiguity, correlating relations, long-tail data, and fine-grained relation distributions.
It sets a marker for future directions to alleviate these issues, thereby proving to be a critical resource for novice and advanced researchers.
arXiv Detail & Related papers (2024-09-07T23:40:47Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Distantly Supervised Morpho-Syntactic Model for Relation Extraction [0.27195102129094995]
We present a method for the extraction and categorisation of an unrestricted set of relationships from text.
We evaluate our approach on six datasets built on Wikidata and Wikipedia.
arXiv Detail & Related papers (2024-01-18T14:17:40Z) - SEMQA: Semi-Extractive Multi-Source Question Answering [94.04430035121136]
We introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion.
We create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions.
arXiv Detail & Related papers (2023-11-08T18:46:32Z) - Instruct and Extract: Instruction Tuning for On-Demand Information
Extraction [86.29491354355356]
On-Demand Information Extraction aims to fulfill the personalized demands of real-world users.
We present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set.
Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE.
arXiv Detail & Related papers (2023-10-24T17:54:25Z) - ImPaKT: A Dataset for Open-Schema Knowledge Base Construction [10.073210304061966]
ImPaKT is a dataset for open-schema information extraction consisting of around 2500 text snippets from the C4 corpus, in the shopping domain (product buying guides)
We evaluate the power of this approach by fine-tuning the open source UL2 language model on a subset of the dataset, extracting a set of implication relations from a corpus of product buying guides, and conducting human evaluations of the resulting predictions.
arXiv Detail & Related papers (2022-12-21T05:02:49Z) - Structured information extraction from complex scientific text with
fine-tuned large language models [55.96705756327738]
We present a simple sequence-to-sequence approach to joint named entity recognition and relation extraction.
The approach leverages a pre-trained large language model (LLM), GPT-3, that is fine-tuned on approximately 500 pairs of prompts.
This approach represents a simple, accessible, and highly-flexible route to obtaining large databases of structured knowledge extracted from unstructured text.
arXiv Detail & Related papers (2022-12-10T07:51:52Z) - ReSel: N-ary Relation Extraction from Scientific Text and Tables by
Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles.
Our proposed method ReSel decomposes this task into a two-stage procedure.
Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z) - RelationPrompt: Leveraging Prompts to Generate Synthetic Data for
Zero-Shot Relation Triplet Extraction [65.4337085607711]
We introduce the task setting of Zero-Shot Relation Triplet Extraction (ZeroRTE)
Given an input sentence, each extracted triplet consists of the head entity, relation label, and tail entity where the relation label is not seen at the training stage.
We propose to synthesize relation examples by prompting language models to generate structured texts.
arXiv Detail & Related papers (2022-03-17T05:55:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.