Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering
Approach for Open-Domain Question Answering
- URL: http://arxiv.org/abs/2110.07150v1
- Date: Thu, 14 Oct 2021 04:36:29 GMT
- Title: Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering
Approach for Open-Domain Question Answering
- Authors: Benjamin Muller, Luca Soldaini, Rik Koncel-Kedziorski, Eric Lind,
Alessandro Moschitti
- Abstract summary: Open-Retrieval Generative Question Answering (GenQA) is proven to deliver high-quality, natural-sounding answers in English.
We present the first generalization of the GenQA approach for the multilingual environment.
- Score: 76.99585451345702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-Retrieval Generative Question Answering (GenQA) is proven to deliver
high-quality, natural-sounding answers in English. In this paper, we present
the first generalization of the GenQA approach for the multilingual
environment. To this end, we present the GenTyDiQA dataset, which extends the
TyDiQA evaluation data (Clark et al., 2020) with natural-sounding, well-formed
answers in Arabic, Bengali, English, Japanese, and Russian. For all these
languages, we show that a GenQA sequence-to-sequence-based model outperforms a
state-of-the-art Answer Sentence Selection model. We also show that a
multilingually-trained model competes with, and in some cases outperforms, its
monolingual counterparts. Finally, we show that our system can even compete
with strong baselines, even when fed with information from a variety of
languages. Essentially, our system is able to answer a question in any language
of our language set using information from many languages, making it the first
Language-Agnostic GenQA system.
Related papers
- Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - Evaluating and Modeling Attribution for Cross-Lingual Question Answering [80.4807682093432]
This work is the first to study attribution for cross-lingual question answering.
We collect data in 5 languages to assess the attribution level of a state-of-the-art cross-lingual QA system.
We find that a substantial portion of the answers is not attributable to any retrieved passages.
arXiv Detail & Related papers (2023-05-23T17:57:46Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - Bridging the Language Gap: Knowledge Injected Multilingual Question
Answering [19.768708263635176]
We propose a generalized cross-lingual transfer framework to enhance the model's ability to understand different languages.
Experiment results on real-world datasets MLQA demonstrate that the proposed method can improve the performance by a large margin.
arXiv Detail & Related papers (2023-04-06T15:41:25Z) - Generative Language Models for Paragraph-Level Question Generation [79.31199020420827]
Powerful generative models have led to recent progress in question generation (QG)
It is difficult to measure advances in QG research since there are no standardized resources that allow a uniform comparison among approaches.
We introduce QG-Bench, a benchmark for QG that unifies existing question answering datasets by converting them to a standard QG setting.
arXiv Detail & Related papers (2022-10-08T10:24:39Z) - Investigating Post-pretraining Representation Alignment for
Cross-Lingual Question Answering [20.4489424966613]
We investigate the capabilities of multilingually pre-trained language models on cross-lingual question answering systems.
We find that explicitly aligning the representations across languages with a post-hoc fine-tuning step generally leads to improved performance.
arXiv Detail & Related papers (2021-09-24T15:32:45Z) - Multilingual Answer Sentence Reranking via Automatically Translated Data [97.98885151955467]
We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems.
The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources.
arXiv Detail & Related papers (2021-02-20T03:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.