Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in
Icelandic
- URL: http://arxiv.org/abs/2207.01918v1
- Date: Tue, 5 Jul 2022 09:52:34 GMT
- Title: Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in
Icelandic
- Authors: V\'esteinn Sn{\ae}bjarnarson and Hafsteinn Einarsson
- Abstract summary: It can be challenging to build effective open question answering (open QA) systems for languages other than English.
We present a data efficient method to bootstrap such a system for languages other than English.
Our approach requires only limited QA resources in the given language, along with machine-translated data, and at least a bilingual language model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: It can be challenging to build effective open question answering (open QA)
systems for languages other than English, mainly due to a lack of labeled data
for training. We present a data efficient method to bootstrap such a system for
languages other than English. Our approach requires only limited QA resources
in the given language, along with machine-translated data, and at least a
bilingual language model. To evaluate our approach, we build such a system for
the Icelandic language and evaluate performance over trivia style datasets. The
corpora used for training are English in origin but machine translated into
Icelandic. We train a bilingual Icelandic/English language model to embed
English context and Icelandic questions following methodology introduced with
DensePhrases (Lee et al., 2021). The resulting system is an open domain
cross-lingual QA system between Icelandic and English. Finally, the system is
adapted for Icelandic only open QA, demonstrating how it is possible to
efficiently create an open QA system with limited access to curated datasets in
the language of interest.
Related papers
- MST5 -- Multilingual Question Answering over Knowledge Graphs [1.6470999044938401]
Knowledge Graph Question Answering (KGQA) simplifies querying vast amounts of knowledge stored in a graph-based model using natural language.
Existing multilingual KGQA systems face challenges in achieving performance comparable to English systems.
We propose a simplified approach to enhance multilingual KGQA systems by incorporating linguistic context and entity information directly into the processing pipeline of a language model.
arXiv Detail & Related papers (2024-07-08T15:37:51Z) - Datasets for Multilingual Answer Sentence Selection [59.28492975191415]
We introduce new high-quality datasets for AS2 in five European languages (French, German, Italian, Portuguese, and Spanish)
Results indicate that our datasets are pivotal in producing robust and powerful multilingual AS2 models.
arXiv Detail & Related papers (2024-06-14T16:50:29Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - Building Efficient and Effective OpenQA Systems for Low-Resource Languages [17.64851283209797]
We show that effective, low-cost OpenQA systems can be developed for low-resource contexts.
Key ingredients are weak supervision using machine-translated labeled datasets and a relevant unstructured knowledge source.
We present SQuAD-TR, a machine translation of SQuAD2.0, and we build our OpenQA system by adapting ColBERT-QA and retraining it over Turkish resources.
arXiv Detail & Related papers (2024-01-07T22:11:36Z) - AfriQA: Cross-lingual Open-Retrieval Question Answering for African
Languages [18.689806554953236]
Cross-lingual open-retrieval question answering (XOR QA) systems retrieve answer content from other languages while serving people in their native language.
We create AfriQA, the first cross-lingual QA dataset with a focus on African languages.
AfriQA includes 12,000+ XOR QA examples across 10 African languages.
arXiv Detail & Related papers (2023-05-11T15:34:53Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia
and Wikidata Translated by Native Speakers [68.9964449363406]
We extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages.
Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before.
arXiv Detail & Related papers (2022-01-31T22:19:55Z) - Investigating Post-pretraining Representation Alignment for
Cross-Lingual Question Answering [20.4489424966613]
We investigate the capabilities of multilingually pre-trained language models on cross-lingual question answering systems.
We find that explicitly aligning the representations across languages with a post-hoc fine-tuning step generally leads to improved performance.
arXiv Detail & Related papers (2021-09-24T15:32:45Z) - Multilingual Answer Sentence Reranking via Automatically Translated Data [97.98885151955467]
We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems.
The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources.
arXiv Detail & Related papers (2021-02-20T03:52:08Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.