Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering
- URL: http://arxiv.org/abs/2108.02866v1
- Date: Thu, 5 Aug 2021 22:04:13 GMT
- Title: Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering
- Authors: Alexander Hanbo Li, Patrick Ng, Peng Xu, Henghui Zhu, Zhiguo Wang,
Bing Xiang
- Abstract summary: A large amount of world's knowledge is stored in structured databases.
query languages can answer questions that require complex reasoning, as well as offering full explainability.
- Score: 78.9863753810787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The current state-of-the-art generative models for open-domain question
answering (ODQA) have focused on generating direct answers from unstructured
textual information. However, a large amount of world's knowledge is stored in
structured databases, and need to be accessed using query languages such as
SQL. Furthermore, query languages can answer questions that require complex
reasoning, as well as offering full explainability. In this paper, we propose a
hybrid framework that takes both textual and tabular evidence as input and
generates either direct answers or SQL queries depending on which form could
better answer the question. The generated SQL queries can then be executed on
the associated databases to obtain the final answers. To the best of our
knowledge, this is the first paper that applies Text2SQL to ODQA tasks.
Empirically, we demonstrate that on several ODQA datasets, the hybrid methods
consistently outperforms the baseline models that only take homogeneous input
by a large margin. Specifically we achieve state-of-the-art performance on
OpenSQuAD dataset using a T5-base model. In a detailed analysis, we demonstrate
that the being able to generate structural SQL queries can always bring gains,
especially for those questions that requires complex reasoning.
Related papers
- PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries [32.40808001281668]
Real user questions can often be ambiguous with multiple interpretations or unanswerable due to a lack of relevant data.
In this work, we construct a practical conversational text-to-text dataset.
We generate conversations with four turns: the initial user question, an assistant response seeking clarification, the user's clarification, and the assistant's clarified.
arXiv Detail & Related papers (2024-10-14T20:36:35Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - QURG: Question Rewriting Guided Context-Dependent Text-to-SQL Semantic
Parsing [46.05006486399823]
This paper presents QURG, a novel Question Rewriting Guided approach to help the models achieve adequate contextual understanding.
We first train a question rewriting model to complete the current question based on question context, and convert them into a rewriting edit matrix.
We further design a two-stream matrix encoder to jointly model rewriting relations between question and context, and the schema linking relations between natural language and structured schema.
arXiv Detail & Related papers (2023-05-11T08:45:55Z) - Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton
Retrieval [17.747079214502673]
Text-to- is a task that converts a natural language question into a structured query language () to retrieve information from a database.
In this paper, we propose an LLM-based framework for Text-to- which retrieves helpful demonstration examples to prompt LLMs.
We design a de-semanticization mechanism that extracts question skeletons, allowing us to retrieve similar examples based on their structural similarity.
arXiv Detail & Related papers (2023-04-26T06:02:01Z) - Towards Generalizable and Robust Text-to-SQL Parsing [77.18724939989647]
We propose a novel TKK framework consisting of Task decomposition, Knowledge acquisition, and Knowledge composition to learn text-to- parsing in stages.
We show that our framework is effective in all scenarios and state-of-the-art performance on the Spider, SParC, and Co. datasets.
arXiv Detail & Related papers (2022-10-23T09:21:27Z) - Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR.
Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries.
Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Data Augmentation with Hierarchical SQL-to-Question Generation for
Cross-domain Text-to-SQL Parsing [40.65143087243074]
This paper presents a simple yet effective data augmentation framework.
First, given a database, we automatically produce a large amount ofsql queries based on an abstract syntax tree grammar citeyintranx.
Second, we propose a hierarchicalsql-to-question generation model to obtain high-quality natural language questions.
arXiv Detail & Related papers (2021-03-03T07:37:38Z) - Did You Ask a Good Question? A Cross-Domain Question Intention
Classification Benchmark for Text-to-SQL [32.946103197082124]
Triage is the first cross-domain text-to-question classification benchmark.
It requires models to distinguish four types of unanswerable questions from answerable questions.
The baseline RoBERTa model achieves a 60% F1 score on the test set.
arXiv Detail & Related papers (2020-10-23T19:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.