Cross-Lingual Question Answering over Knowledge Base as Reading
Comprehension
- URL: http://arxiv.org/abs/2302.13241v1
- Date: Sun, 26 Feb 2023 05:52:52 GMT
- Title: Cross-Lingual Question Answering over Knowledge Base as Reading
Comprehension
- Authors: Chen Zhang, Yuxuan Lai, Yansong Feng, Xingyu Shen, Haowei Du, Dongyan
Zhao
- Abstract summary: Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base.
One of the major challenges facing xKBQA is the high cost of data annotation.
We propose a novel approach for xKBQA in a reading comprehension paradigm.
- Score: 61.079852289005025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although many large-scale knowledge bases (KBs) claim to contain multilingual
information, their support for many non-English languages is often incomplete.
This incompleteness gives birth to the task of cross-lingual question answering
over knowledge base (xKBQA), which aims to answer questions in languages
different from that of the provided KB. One of the major challenges facing
xKBQA is the high cost of data annotation, leading to limited resources
available for further exploration. Another challenge is mapping KB schemas and
natural language expressions in the questions under cross-lingual settings. In
this paper, we propose a novel approach for xKBQA in a reading comprehension
paradigm. We convert KB subgraphs into passages to narrow the gap between KB
schemas and questions, which enables our model to benefit from recent advances
in multilingual pre-trained language models (MPLMs) and cross-lingual machine
reading comprehension (xMRC). Specifically, we use MPLMs, with considerable
knowledge of cross-lingual mappings, for cross-lingual reading comprehension.
Existing high-quality xMRC datasets can be further utilized to finetune our
model, greatly alleviating the data scarcity issue in xKBQA. Extensive
experiments on two xKBQA datasets in 12 languages show that our approach
outperforms various baselines and achieves strong few-shot and zero-shot
performance. Our dataset and code are released for further research.
Related papers
- INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages [26.13077589552484]
Indic-QA is the largest publicly available context-grounded question-answering dataset for 11 major Indian languages from two language families.
We generate a synthetic dataset using the Gemini model to create question-answer pairs given a passage, which is then manually verified for quality assurance.
We evaluate various multilingual Large Language Models and their instruction-fine-tuned variants on the benchmark and observe that their performance is subpar, particularly for low-resource languages.
arXiv Detail & Related papers (2024-07-18T13:57:16Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans [27.84922167294656]
It is challenging to curate a dataset for language-specific knowledge and common sense.
Most current multilingual datasets are created through translation, which cannot evaluate such language-specific aspects.
We propose Multilingual CommonsenseQA (mCSQA) based on the construction process of CSQA but leveraging language models for a more efficient construction.
arXiv Detail & Related papers (2024-06-06T16:14:54Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - Few-shot In-context Learning for Knowledge Base Question Answering [31.73274700847965]
We propose KB-BINDER, which for the first time enables few-shot in-context learning over KBQA tasks.
The experimental results on four public heterogeneous KBQA datasets show that KB-BINDER can achieve a strong performance with only a few in-context demonstrations.
arXiv Detail & Related papers (2023-05-02T19:31:55Z) - A Chinese Multi-type Complex Questions Answering Dataset over Wikidata [45.31495982252219]
Complex Knowledge Base Question Answering is a popular area of research in the past decade.
Recent public datasets have led to encouraging results in this field, but are mostly limited to English.
Few state-of-the-art KBQA models are trained on Wikidata, one of the most popular real-world knowledge bases.
We propose CLC-QuAD, the first large scale complex Chinese semantic parsing dataset over Wikidata to address these challenges.
arXiv Detail & Related papers (2021-11-11T07:39:16Z) - Prix-LM: Pretraining for Multilingual Knowledge Base Construction [59.02868906044296]
We propose a unified framework, Prix-LM, for multilingual knowledge construction and completion.
We leverage two types of knowledge, monolingual triples and cross-lingual links, extracted from existing multilingual KBs.
Experiments on standard entity-related tasks, such as link prediction in multiple languages, cross-lingual entity linking and bilingual lexicon induction, demonstrate its effectiveness.
arXiv Detail & Related papers (2021-10-16T02:08:46Z) - Reasoning Over Virtual Knowledge Bases With Open Predicate Relations [85.19305347984515]
We present the Open Predicate Query Language (OPQL)
OPQL is a method for constructing a virtual Knowledge Base (VKB) trained entirely from text.
We demonstrate that OPQL outperforms prior VKB methods on two different KB reasoning tasks.
arXiv Detail & Related papers (2021-02-14T01:29:54Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.