SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions
- URL: http://arxiv.org/abs/2407.11417v2
- Date: Mon, 21 Oct 2024 06:17:56 GMT
- Title: SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions
- Authors: Shicheng Liu, Sina J. Semnani, Harold Triedman, Jialiang Xu, Isaac Dan Zhao, Monica S. Lam,
- Abstract summary: We introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from discussions on Wikidata's "Request a Query" forum.
The complexity of these in-the-wild queries calls for a KBQA system that can dynamically explore large and often incomplete schemas and reason about them.
We also introduce an in-context learning KBQA agent, also called SPINACH, that mimics how a human expert would write SPARQLs to handle challenging questions.
- Score: 6.933892616704001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have led to significant improvements in the Knowledge Base Question Answering (KBQA) task. However, datasets used in KBQA studies do not capture the true complexity of KBQA tasks. They either have simple questions, use synthetically generated logical forms, or are based on small knowledge base (KB) schemas. We introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from discussions on Wikidata's "Request a Query" forum with 320 decontextualized question-SPARQL pairs. The complexity of these in-the-wild queries calls for a KBQA system that can dynamically explore large and often incomplete schemas and reason about them, as it is infeasible to create a comprehensive training dataset. We also introduce an in-context learning KBQA agent, also called SPINACH, that mimics how a human expert would write SPARQLs to handle challenging questions. SPINACH achieves a new state of the art on the QALD-7, QALD-9 Plus and QALD-10 datasets by 31.0%, 27.0%, and 10.0% in $F_1$, respectively, and coming within 1.6% of the fine-tuned LLaMA SOTA model on WikiWebQuestions. On our new SPINACH dataset, the SPINACH agent outperforms all baselines, including the best GPT-4-based KBQA agent, by at least 38.1% in $F_1$.
Related papers
- KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA.
Every question requires the integration of information from both the table and the sub-graph to be answered.
We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z) - Few-shot Transfer Learning for Knowledge Base Question Answering: Fusing Supervised Models with In-Context Learning [20.80841972133938]
Existing Knowledge Base Question Answering (KBQA) architectures are hungry for annotated data.
We introduce the problem of few-shot transfer learning for KBQA, where the target domain offers only a few labeled examples.
We propose a novel KBQA architecture called FuSIC-KBQA that performs KB-retrieval using multiple source-trained retrievers.
arXiv Detail & Related papers (2023-11-15T11:56:56Z) - Spider4SPARQL: A Complex Benchmark for Evaluating Knowledge Graph
Question Answering Systems [1.4732811715354452]
It has become increasingly important to provide realistic benchmarks for evaluating Knowledge Graph Question Answering systems.
Spider4SPARQL is a new SPARQL benchmark dataset featuring 9,693 previously existing manually generated NL questions and 4,721 unique, novel, and complex SPARQL queries.
We evaluate the system with state-of-the-art KGQA systems as well as LLMs, which achieve only up to 45% execution accuracy.
arXiv Detail & Related papers (2023-09-28T08:41:08Z) - FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base
Question Answering [16.88132219032486]
We introduce FlexKBQA to mitigate the burden associated with manual annotation.
We leverage Large Language Models (LLMs) as program translators for addressing the challenges inherent in the few-shot KBQA task.
Specifically, FlexKBQA leverages automated algorithms to sample diverse programs, such as SPARQL queries, from the knowledge base.
We observe that under the few-shot even the more challenging zero-shot scenarios, FlexKBQA achieves impressive results with a few annotations.
arXiv Detail & Related papers (2023-08-23T11:00:36Z) - Cross-Lingual Question Answering over Knowledge Base as Reading
Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base.
One of the major challenges facing xKBQA is the high cost of data annotation.
We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z) - AutoQGS: Auto-Prompt for Low-Resource Knowledge-based Question
Generation from SPARQL [18.019353543946913]
This study investigates the task of knowledge-based question generation (KBQG)
Conventional KBQG works generated questions from fact triples in the knowledge graph, which could not express complex operations like aggregation and comparison in SPARQL.
We propose an auto-prompter trained on large-scale unsupervised data to rephrase SPARQL into NL description.
arXiv Detail & Related papers (2022-08-26T06:53:46Z) - A Chinese Multi-type Complex Questions Answering Dataset over Wikidata [45.31495982252219]
Complex Knowledge Base Question Answering is a popular area of research in the past decade.
Recent public datasets have led to encouraging results in this field, but are mostly limited to English.
Few state-of-the-art KBQA models are trained on Wikidata, one of the most popular real-world knowledge bases.
We propose CLC-QuAD, the first large scale complex Chinese semantic parsing dataset over Wikidata to address these challenges.
arXiv Detail & Related papers (2021-11-11T07:39:16Z) - SYGMA: System for Generalizable Modular Question Answering OverKnowledge
Bases [57.89642289610301]
We present SYGMA, a modular approach facilitating general-izability across multiple knowledge bases and multiple rea-soning types.
We demonstrate effectiveness of our system by evaluating on datasets belonging to two distinct knowledge bases,DBpedia and Wikidata.
arXiv Detail & Related papers (2021-09-28T01:57:56Z) - Beyond I.I.D.: Three Levels of Generalization for Question Answering on
Knowledge Bases [63.43418760818188]
We release a new large-scale, high-quality dataset with 64,331 questions, GrailQA.
We propose a novel BERT-based KBQA model.
The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA.
arXiv Detail & Related papers (2020-11-16T06:36:26Z) - A Survey on Complex Question Answering over Knowledge Base: Recent
Advances and Challenges [71.4531144086568]
Question Answering (QA) over Knowledge Base (KB) aims to automatically answer natural language questions.
Researchers have shifted their attention from simple questions to complex questions, which require more KB triples and constraint inference.
arXiv Detail & Related papers (2020-07-26T07:13:32Z) - KQA Pro: A Dataset with Explicit Compositional Programs for Complex
Question Answering over Knowledge Base [67.87878113432723]
We introduce KQA Pro, a dataset for Complex KBQA including 120K diverse natural language questions.
For each question, we provide the corresponding KoPL program and SPARQL query, so that KQA Pro serves for both KBQA and semantic parsing tasks.
arXiv Detail & Related papers (2020-07-08T03:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.