AutoKG: Constructing Virtual Knowledge Graphs from Unstructured
Documents for Question Answering
- URL: http://arxiv.org/abs/2008.08995v2
- Date: Wed, 10 Mar 2021 20:45:02 GMT
- Title: AutoKG: Constructing Virtual Knowledge Graphs from Unstructured
Documents for Question Answering
- Authors: Seunghak Yu, Tianxing He, James Glass
- Abstract summary: We propose a novel framework to automatically construct a knowledge graph from unstructured documents.
We first extract knowledges from unstructured documents and encode them with contextual information.
Entities with similar context semantics are linked through internal alignment to form a graph structure.
This allows us to extract desired information from multiple documents by traversing the generated KG without a manual process.
- Score: 19.72815568759182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge graphs (KGs) have the advantage of providing fine-grained detail
for question-answering systems. Unfortunately, building a reliable KG is
time-consuming and expensive as it requires human intervention. To overcome
this issue, we propose a novel framework to automatically construct a KG from
unstructured documents that does not require external alignment. We first
extract surface-form knowledge tuples from unstructured documents and encode
them with contextual information. Entities with similar context semantics are
then linked through internal alignment to form a graph structure. This allows
us to extract the desired information from multiple documents by traversing the
generated KG without a manual process. We examine its performance in retrieval
based QA systems by reformulating the WikiMovies and MetaQA datasets into a
tuple-level retrieval task. The experimental results show that our method
outperforms traditional retrieval methods by a large margin.
Related papers
- Ontology-Guided, Hybrid Prompt Learning for Generalization in Knowledge Graph Question Answering [6.232269207752904]
We present OntoSCPrompt, a novel Large Language Model (LLM)-based KGQA approach with a two-stage architecture.
OntoSCPrompt first generates a SPARQL query structure (including SPARQL keywords such as SELECT, ASK, WHERE and placeholders for missing tokens) and then fills them with KG-specific information.
We present several task-specific decoding strategies to ensure the correctness and executability of generated SPARQL queries in both stages.
arXiv Detail & Related papers (2025-02-06T11:47:58Z) - Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema [60.42231674887294]
We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Language Models (LLMs) on a knowledge base.
We ground generation of KG with the authored ontology based on extracted relations to ensure consistency and interpretability.
Our work presents a promising direction for scalable KG construction pipeline with minimal human intervention, that yields high quality and human-interpretable KGs.
arXiv Detail & Related papers (2024-12-30T13:36:05Z) - iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models [0.7165255458140439]
iText2KG is a method for incremental, topic-independent Knowledge Graph construction without post-processing.
Our method demonstrates superior performance compared to baseline methods across three scenarios.
arXiv Detail & Related papers (2024-09-05T06:49:14Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - Text-To-KG Alignment: Comparing Current Methods on Classification Tasks [2.191505742658975]
knowledge graphs (KG) provide dense and structured representations of factual information.
Recent work has focused on creating pipeline models that retrieve information from KGs as additional context.
It is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query.
arXiv Detail & Related papers (2023-06-05T13:45:45Z) - A Universal Question-Answering Platform for Knowledge Graphs [7.2676028986202]
We propose KGQAn, a universal QA system that does not need to be tailored to each target KG.
KGQAn is easily deployed and outperforms by a large margin the state-of-the-art in terms of quality of answers and processing time.
arXiv Detail & Related papers (2023-03-01T15:35:32Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Query-Specific Knowledge Graphs for Complex Finance Topics [6.599344783327053]
We focus on the CODEC dataset, where domain experts create challenging questions.
We show that state-of-the-art ranking systems have headroom for improvement.
We demonstrate that entity and document relevance are positively correlated.
arXiv Detail & Related papers (2022-11-08T10:21:13Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.