Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text
- URL: http://arxiv.org/abs/2110.08417v1
- Date: Sat, 16 Oct 2021 00:11:21 GMT
- Title: Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text
- Authors: Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao
- Abstract summary: We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
- Score: 62.489652395307914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to its potential for a universal interface over both data and text,
data-to-text generation is becoming increasingly popular recently. However, few
previous work has focused on its application to downstream tasks, e.g. using
the converted data for grounding or reasoning. In this work, we aim to bridge
this gap and use the data-to-text method as a means for encoding structured
knowledge for knowledge-intensive applications, i.e. open-domain question
answering (QA). Specifically, we propose a verbalizer-retriever-reader
framework for open-domain QA over data and text where verbalized tables from
Wikipedia and triples from Wikidata are used as augmented knowledge sources. We
show that our Unified Data and Text QA, UDT-QA, can effectively benefit from
the expanded knowledge index, leading to large gains over text-only baselines.
Notably, our approach sets the single-model state-of-the-art on Natural
Questions. Furthermore, our analyses indicate that verbalized knowledge is
preferred for answer reasoning for both adapted and hot-swap settings.
Related papers
- Contri(e)ve: Context + Retrieve for Scholarly Question Answering [0.0]
We present a two step solution using open source Large Language Model(LLM): Llama3.1 for Scholarly-QALD dataset.
Firstly, we extract the context pertaining to the question from different structured and unstructured data sources.
Secondly, we implement prompt engineering to improve the information retrieval performance of the LLM.
arXiv Detail & Related papers (2024-09-13T17:38:47Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - BigText-QA: Question Answering over a Large-Scale Hybrid Knowledge Graph [23.739432128095107]
BigText-QA is able to answer questions based on a structured knowledge graph.
Our results demonstrate that BigText-QA outperforms DrQA, a neural-network-based QA system, and achieves competitive results to QUEST, a graph-based unsupervised QA system.
arXiv Detail & Related papers (2022-12-12T09:49:02Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - External Knowledge Augmented Text Visual Question Answering [0.6445605125467573]
We propose a framework to extract, filter, and encode knowledge atop a standard multimodal transformer for vision language understanding tasks.
We generate results comparable to the state-of-the-art on two publicly available datasets.
arXiv Detail & Related papers (2021-08-22T13:21:58Z) - Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering [78.9863753810787]
A large amount of world's knowledge is stored in structured databases.
query languages can answer questions that require complex reasoning, as well as offering full explainability.
arXiv Detail & Related papers (2021-08-05T22:04:13Z) - Unified Open-Domain Question Answering with Structured and Unstructured
Knowledge [7.7429684536437104]
We study open-domain question answering (ODQA) with structured, unstructured and semi-structured knowledge sources.
Our approach homogenizes all sources by reducing them to text, and applies recent, powerful retriever-reader models.
As a result, our unified model produces state-of-the-art results on 3 popular ODQA benchmarks.
arXiv Detail & Related papers (2020-12-29T05:14:08Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.