Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic
Knowledge
- URL: http://arxiv.org/abs/2301.01067v1
- Date: Tue, 3 Jan 2023 12:37:47 GMT
- Title: Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic
Knowledge
- Authors: Longxu Dou, Yan Gao, Xuqi Liu, Mingyang Pan, Dingzirui Wang, Wanxiang
Che, Dechen Zhan, Min-Yen Kan, Jian-Guang Lou
- Abstract summary: We build a new Chinese benchmark Know consisting of domain-specific questions covering various domains.
We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples.
More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing.
- Score: 54.85168428642474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the problem of knowledge-intensive text-to-SQL, in
which domain knowledge is necessary to parse expert questions into SQL queries
over domain-specific tables. We formalize this scenario by building a new
Chinese benchmark KnowSQL consisting of domain-specific questions covering
various domains. We then address this problem by presenting formulaic
knowledge, rather than by annotating additional data examples. More concretely,
we construct a formulaic knowledge bank as a domain knowledge base and propose
a framework (ReGrouP) to leverage this formulaic knowledge during parsing.
Experiments using ReGrouP demonstrate a significant 28.2% improvement overall
on KnowSQL.
Related papers
- Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator [33.680619900836376]
We propose a pipeline to solve the domain-specific calculation problems with Knowledge-Intensive Programs Generator.
It generates knowledge-intensive programs according to the domain-specific documents.
We also find that the code generator is also adaptable to other domains, without training on the new knowledge.
arXiv Detail & Related papers (2024-12-12T13:42:58Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - DocuT5: Seq2seq SQL Generation with Table Documentation [5.586191108738563]
We develop a new text-to- taxonomy failure taxonomy and find that 19.6% of errors are due to foreign key mistakes.
We propose DocuT5, a method that captures knowledge from (1) table structure context of foreign keys and (2) domain knowledge through contextualizing tables and columns.
Both types of knowledge improve over state-of-the-art T5 with constrained decoding on Spider, and domain knowledge produces state-of-the-art comparable effectiveness on Spider-DK and Spider-SYN datasets.
arXiv Detail & Related papers (2022-11-11T13:31:55Z) - Towards Generalizable and Robust Text-to-SQL Parsing [77.18724939989647]
We propose a novel TKK framework consisting of Task decomposition, Knowledge acquisition, and Knowledge composition to learn text-to- parsing in stages.
We show that our framework is effective in all scenarios and state-of-the-art performance on the Spider, SParC, and Co. datasets.
arXiv Detail & Related papers (2022-10-23T09:21:27Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Exploring Underexplored Limitations of Cross-Domain Text-to-SQL
Generalization [20.550737675032448]
Existing text-to-curated models do not generalize when facing domain knowledge that does not frequently appear in the training data.
In this work, we investigate the robustness of text-to-curated models when the questions require rarely observed domain knowledge.
We demonstrate that the prediction accuracy dramatically drops on samples that require such domain knowledge, even if the domain knowledge appears in the training set.
arXiv Detail & Related papers (2021-09-11T02:01:04Z) - Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering [78.9863753810787]
A large amount of world's knowledge is stored in structured databases.
query languages can answer questions that require complex reasoning, as well as offering full explainability.
arXiv Detail & Related papers (2021-08-05T22:04:13Z) - "What Do You Mean by That?" A Parser-Independent Interactive Approach
for Enhancing Text-to-SQL [49.85635994436742]
We include human in the loop and present a novel-independent interactive approach (PIIA) that interacts with users using multi-choice questions.
PIIA is capable of enhancing the text-to-domain performance with limited interaction turns by using both simulation and human evaluation.
arXiv Detail & Related papers (2020-11-09T02:14:33Z) - Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question
Answering [33.920269584939334]
Open Domain Question Answering requires systems to retrieve external knowledge and perform multi-hop reasoning.
We learn a semantic knowledge ranking model to re-rank knowledge retrieved through Lucene based information retrieval systems.
We propose a "knowledge fusion model" which leverages knowledge in BERT-based language models with externally retrieved knowledge.
arXiv Detail & Related papers (2020-04-07T03:16:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.