XOR QA: Cross-lingual Open-Retrieval Question Answering
- URL: http://arxiv.org/abs/2010.11856v3
- Date: Tue, 13 Apr 2021 05:22:01 GMT
- Title: XOR QA: Cross-lingual Open-Retrieval Question Answering
- Authors: Akari Asai, Jungo Kasai, Jonathan H. Clark, Kenton Lee, Eunsol Choi
and Hannaneh Hajishirzi
- Abstract summary: This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
- Score: 75.20578121267411
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual question answering tasks typically assume answers exist in the
same language as the question. Yet in practice, many languages face both
information scarcity -- where languages have few reference articles -- and
information asymmetry -- where questions reference concepts from other
cultures. This work extends open-retrieval question answering to a
cross-lingual setting enabling questions from one language to be answered via
answer content from another language. We construct a large-scale dataset built
on questions from TyDi QA lacking same-language answers. Our task formulation,
called Cross-lingual Open Retrieval Question Answering (XOR QA), includes 40k
information-seeking questions from across 7 diverse non-English languages.
Based on this dataset, we introduce three new tasks that involve cross-lingual
document retrieval using multi-lingual and English resources. We establish
baselines with state-of-the-art machine translation systems and cross-lingual
pretrained models. Experimental results suggest that XOR QA is a challenging
task that will facilitate the development of novel techniques for multilingual
question answering. Our data and code are available at
https://nlp.cs.washington.edu/xorqa.
Related papers
- OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context [4.39796591456426]
We present OMoS-QA, a dataset of German and English questions paired with trustworthy documents and manually annotated answers.
Questions are automatically generated with an open-source large language model (LLM) and answer sentences are selected by crowd workers.
We find high precision and low-to-mid recall in selecting answer sentences, which is a favorable trade-off to avoid misleading users.
arXiv Detail & Related papers (2024-07-22T15:40:17Z) - CaLMQA: Exploring culturally specific long-form question answering across 23 languages [58.18984409715615]
CaLMQA is a collection of 1.5K culturally specific questions spanning 23 languages and 51 culturally translated questions from English into 22 other languages.
We collect naturally-occurring questions from community web forums and hire native speakers to write questions to cover under-studied languages such as Fijian and Kirundi.
Our dataset contains diverse, complex questions that reflect cultural topics (e.g. traditions, laws, news) and the language usage of native speakers.
arXiv Detail & Related papers (2024-06-25T17:45:26Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - AfriQA: Cross-lingual Open-Retrieval Question Answering for African
Languages [18.689806554953236]
Cross-lingual open-retrieval question answering (XOR QA) systems retrieve answer content from other languages while serving people in their native language.
We create AfriQA, the first cross-lingual QA dataset with a focus on African languages.
AfriQA includes 12,000+ XOR QA examples across 10 African languages.
arXiv Detail & Related papers (2023-05-11T15:34:53Z) - Bridging the Language Gap: Knowledge Injected Multilingual Question
Answering [19.768708263635176]
We propose a generalized cross-lingual transfer framework to enhance the model's ability to understand different languages.
Experiment results on real-world datasets MLQA demonstrate that the proposed method can improve the performance by a large margin.
arXiv Detail & Related papers (2023-04-06T15:41:25Z) - Cross-Lingual Question Answering over Knowledge Base as Reading
Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base.
One of the major challenges facing xKBQA is the high cost of data annotation.
We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z) - Investigating Information Inconsistency in Multilingual Open-Domain
Question Answering [18.23417521199809]
We analyze the behavior of multilingual open-domain question answering models with a focus on retrieval bias.
We speculate that the content differences in documents across languages might reflect cultural divergences and/or social biases.
arXiv Detail & Related papers (2022-05-25T02:58:54Z) - Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering
Approach for Open-Domain Question Answering [76.99585451345702]
Open-Retrieval Generative Question Answering (GenQA) is proven to deliver high-quality, natural-sounding answers in English.
We present the first generalization of the GenQA approach for the multilingual environment.
arXiv Detail & Related papers (2021-10-14T04:36:29Z) - One Question Answering Model for Many Languages with Cross-lingual Dense
Passage Retrieval [39.061900747689094]
CORA is a Cross-lingual Open-Retrieval Answer Generation model.
It can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
arXiv Detail & Related papers (2021-07-26T06:02:54Z) - Multilingual Answer Sentence Reranking via Automatically Translated Data [97.98885151955467]
We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems.
The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources.
arXiv Detail & Related papers (2021-02-20T03:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.