QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia
and Wikidata Translated by Native Speakers
- URL: http://arxiv.org/abs/2202.00120v1
- Date: Mon, 31 Jan 2022 22:19:55 GMT
- Title: QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia
and Wikidata Translated by Native Speakers
- Authors: Aleksandr Perevalov, Dennis Diefenbach, Ricardo Usbeck, Andreas Both
- Abstract summary: We extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages.
Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before.
- Score: 68.9964449363406
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to have the same experience for different user groups (i.e.,
accessibility) is one of the most important characteristics of Web-based
systems. The same is true for Knowledge Graph Question Answering (KGQA) systems
that provide the access to Semantic Web data via natural language interface.
While following our research agenda on the multilingual aspect of accessibility
of KGQA systems, we identified several ongoing challenges. One of them is the
lack of multilingual KGQA benchmarks. In this work, we extend one of the most
popular KGQA benchmarks - QALD-9 by introducing high-quality questions'
translations to 8 languages provided by native speakers, and transferring the
SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and
relevance of the dataset is strongly increased. Five of the languages -
Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge
were never considered in KGQA research community before. The latter two of the
languages are considered as "endangered" by UNESCO. We call the extended
dataset QALD-9-plus and made it available online
https://github.com/Perevalov/qald_9_plus.
Related papers
- MST5 -- Multilingual Question Answering over Knowledge Graphs [1.6470999044938401]
Knowledge Graph Question Answering (KGQA) simplifies querying vast amounts of knowledge stored in a graph-based model using natural language.
Existing multilingual KGQA systems face challenges in achieving performance comparable to English systems.
We propose a simplified approach to enhance multilingual KGQA systems by incorporating linguistic context and entity information directly into the processing pipeline of a language model.
arXiv Detail & Related papers (2024-07-08T15:37:51Z) - MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering [58.92057773071854]
We introduce MTVQA, the first benchmark featuring high-quality human expert annotations across 9 diverse languages.
MTVQA is the first benchmark featuring high-quality human expert annotations across 9 diverse languages.
arXiv Detail & Related papers (2024-05-20T12:35:01Z) - Evaluating and Modeling Attribution for Cross-Lingual Question Answering [80.4807682093432]
This work is the first to study attribution for cross-lingual question answering.
We collect data in 5 languages to assess the attribution level of a state-of-the-art cross-lingual QA system.
We find that a substantial portion of the answers is not attributable to any retrieved passages.
arXiv Detail & Related papers (2023-05-23T17:57:46Z) - AfriQA: Cross-lingual Open-Retrieval Question Answering for African
Languages [18.689806554953236]
Cross-lingual open-retrieval question answering (XOR QA) systems retrieve answer content from other languages while serving people in their native language.
We create AfriQA, the first cross-lingual QA dataset with a focus on African languages.
AfriQA includes 12,000+ XOR QA examples across 10 African languages.
arXiv Detail & Related papers (2023-05-11T15:34:53Z) - Cross-Lingual Question Answering over Knowledge Base as Reading
Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base.
One of the major challenges facing xKBQA is the high cost of data annotation.
We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z) - BigText-QA: Question Answering over a Large-Scale Hybrid Knowledge Graph [23.739432128095107]
BigText-QA is able to answer questions based on a structured knowledge graph.
Our results demonstrate that BigText-QA outperforms DrQA, a neural-network-based QA system, and achieves competitive results to QUEST, a graph-based unsupervised QA system.
arXiv Detail & Related papers (2022-12-12T09:49:02Z) - A Chinese Multi-type Complex Questions Answering Dataset over Wikidata [45.31495982252219]
Complex Knowledge Base Question Answering is a popular area of research in the past decade.
Recent public datasets have led to encouraging results in this field, but are mostly limited to English.
Few state-of-the-art KBQA models are trained on Wikidata, one of the most popular real-world knowledge bases.
We propose CLC-QuAD, the first large scale complex Chinese semantic parsing dataset over Wikidata to address these challenges.
arXiv Detail & Related papers (2021-11-11T07:39:16Z) - SD-QA: Spoken Dialectal Question Answering for the Real World [15.401330338654203]
We build a benchmark on five languages (Arabic, Bengali, English, Kiswahili, Korean) with more than 68k audio prompts in 24 dialects from 255 speakers.
We provide baseline results showcasing the real-world performance of QA systems and analyze the effect of language variety and other sensitive speaker attributes on downstream performance.
Last, we study the fairness of the ASR and QA models with respect to the underlying user populations.
arXiv Detail & Related papers (2021-09-24T16:54:27Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.