A Chinese Multi-type Complex Questions Answering Dataset over Wikidata
- URL: http://arxiv.org/abs/2111.06086v1
- Date: Thu, 11 Nov 2021 07:39:16 GMT
- Title: A Chinese Multi-type Complex Questions Answering Dataset over Wikidata
- Authors: Jianyun Zou and Min Yang and Lichao Zhang and Yechen Xu and Qifan Pan
and Fengqing Jiang and Ran Qin and Shushu Wang and Yifan He and Songfang
Huang and Zhou Zhao
- Abstract summary: Complex Knowledge Base Question Answering is a popular area of research in the past decade.
Recent public datasets have led to encouraging results in this field, but are mostly limited to English.
Few state-of-the-art KBQA models are trained on Wikidata, one of the most popular real-world knowledge bases.
We propose CLC-QuAD, the first large scale complex Chinese semantic parsing dataset over Wikidata to address these challenges.
- Score: 45.31495982252219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Complex Knowledge Base Question Answering is a popular area of research in
the past decade. Recent public datasets have led to encouraging results in this
field, but are mostly limited to English and only involve a small number of
question types and relations, hindering research in more realistic settings and
in languages other than English. In addition, few state-of-the-art KBQA models
are trained on Wikidata, one of the most popular real-world knowledge bases. We
propose CLC-QuAD, the first large scale complex Chinese semantic parsing
dataset over Wikidata to address these challenges. Together with the dataset,
we present a text-to-SPARQL baseline model, which can effectively answer
multi-type complex questions, such as factual questions, dual intent questions,
boolean questions, and counting questions, with Wikidata as the background
knowledge. We finally analyze the performance of SOTA KBQA models on this
dataset and identify the challenges facing Chinese KBQA.
Related papers
- SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions [6.933892616704001]
We introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from discussions on Wikidata's "Request a Query" forum.
The complexity of these in-the-wild queries calls for a KBQA system that can dynamically explore large and often incomplete schemas and reason about them.
We also introduce an in-context learning KBQA agent, also called SPINACH, that mimics how a human expert would write SPARQLs to handle challenging questions.
arXiv Detail & Related papers (2024-07-16T06:18:21Z) - NLQxform: A Language Model-based Question to SPARQL Transformer [8.698533396991554]
This paper presents a question-answering (QA) system called NLQxform.
NLQxform allows users to express their complex query intentions in natural language questions.
A transformer-based language model, i.e., BART, is employed to translate questions into standard SPARQL queries.
arXiv Detail & Related papers (2023-11-08T21:41:45Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - Cross-Lingual Question Answering over Knowledge Base as Reading
Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base.
One of the major challenges facing xKBQA is the high cost of data annotation.
We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z) - QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia
and Wikidata Translated by Native Speakers [68.9964449363406]
We extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages.
Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before.
arXiv Detail & Related papers (2022-01-31T22:19:55Z) - A Benchmark for Generalizable and Interpretable Temporal Question
Answering over Knowledge Bases [67.33560134350427]
TempQA-WD is a benchmark dataset for temporal reasoning.
It is based on Wikidata, which is the most frequently curated, openly available knowledge base.
arXiv Detail & Related papers (2022-01-15T08:49:09Z) - Multilingual Compositional Wikidata Questions [9.602430657819564]
We propose a method for creating a multilingual, parallel dataset of question-Query pairs grounded in Wikidata.
We use this data to train semantics for Hebrew, Kannada, Chinese and English to better understand the current strengths and weaknesses of multilingual semantic parsing.
arXiv Detail & Related papers (2021-08-07T19:40:38Z) - A Survey on Complex Question Answering over Knowledge Base: Recent
Advances and Challenges [71.4531144086568]
Question Answering (QA) over Knowledge Base (KB) aims to automatically answer natural language questions.
Researchers have shifted their attention from simple questions to complex questions, which require more KB triples and constraint inference.
arXiv Detail & Related papers (2020-07-26T07:13:32Z) - RuBQ: A Russian Dataset for Question Answering over Wikidata [3.394278383312621]
RuBQ is the first Russian knowledge base question answering (KBQA) dataset.
The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, and a Wikidata sample of triples containing entities with Russian labels.
arXiv Detail & Related papers (2020-05-21T14:06:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.