RuBQ: A Russian Dataset for Question Answering over Wikidata
- URL: http://arxiv.org/abs/2005.10659v1
- Date: Thu, 21 May 2020 14:06:15 GMT
- Title: RuBQ: A Russian Dataset for Question Answering over Wikidata
- Authors: Vladislav Korablinov and Pavel Braslavski
- Abstract summary: RuBQ is the first Russian knowledge base question answering (KBQA) dataset.
The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, and a Wikidata sample of triples containing entities with Russian labels.
- Score: 3.394278383312621
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The paper presents RuBQ, the first Russian knowledge base question answering
(KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of
varying complexity, their English machine translations, SPARQL queries to
Wikidata, reference answers, as well as a Wikidata sample of triples containing
entities with Russian labels. The dataset creation started with a large
collection of question-answer pairs from online quizzes. The data underwent
automatic filtering, crowd-assisted entity linking, automatic generation of
SPARQL queries, and their subsequent in-house verification.
Related papers
- NewsQs: Multi-Source Question Generation for the Inquiring Mind [59.79288644158271]
We present NewsQs, a dataset that provides question-answer pairs for multiple news documents.
To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles.
arXiv Detail & Related papers (2024-02-28T16:59:35Z) - Leveraging LLMs in Scholarly Knowledge Graph Question Answering [7.951847862547378]
KGQA answers natural language questions by leveraging a large language model (LLM)
Our system achieves an F1 score of 99.0% on SciQA - one of the Scholarly Knowledge Graph Question Answering challenge benchmarks.
arXiv Detail & Related papers (2023-11-16T12:13:49Z) - KGConv, a Conversational Corpus grounded in Wikidata [6.451914896767135]
KGConv is a large, conversational corpus of 71k conversations grounded in a Wikidata fact.
We provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model.
KGConv can further be used for other generation and analysis tasks such as single-turn question generation from Wikidata triples, question rewriting, question answering from conversation or from knowledge graphs and quiz generation.
arXiv Detail & Related papers (2023-08-29T13:35:51Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - WikiOmnia: generative QA corpus on the whole Russian Wikipedia [0.2132096006921048]
We present the Wiki Omnia dataset, a new publicly available set of QA-pairs and corresponding Russian Wikipedia article summary sections.
The dataset includes every available article from Wikipedia for the Russian language.
The resulting dataset includes two parts: raw data on the whole Russian Wikipedia (7,930,873 QA pairs with paragraphs for ruGPT-3 XL and 7,991,040 QA pairs with paragraphs for ruT5-large) and cleaned data with strict automatic verification.
arXiv Detail & Related papers (2022-04-17T12:59:36Z) - A Chinese Multi-type Complex Questions Answering Dataset over Wikidata [45.31495982252219]
Complex Knowledge Base Question Answering is a popular area of research in the past decade.
Recent public datasets have led to encouraging results in this field, but are mostly limited to English.
Few state-of-the-art KBQA models are trained on Wikidata, one of the most popular real-world knowledge bases.
We propose CLC-QuAD, the first large scale complex Chinese semantic parsing dataset over Wikidata to address these challenges.
arXiv Detail & Related papers (2021-11-11T07:39:16Z) - ConditionalQA: A Complex Reading Comprehension Dataset with Conditional
Answers [93.55268936974971]
We describe a Question Answering dataset that contains complex questions with conditional answers.
We call this dataset ConditionalQA.
We show that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions.
arXiv Detail & Related papers (2021-10-13T17:16:46Z) - SPARQLing Database Queries from Intermediate Question Decompositions [7.475027071883912]
To translate natural language questions into database queries, most approaches rely on a fully annotated training set.
We reduce this burden using grounded in databases intermediate question representations.
Our pipeline consists of two parts: a semantic that converts natural language questions into the intermediate representations and a non-trainable transpiler to the QLSPAR query language.
arXiv Detail & Related papers (2021-09-13T17:57:12Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.