One Question Answering Model for Many Languages with Cross-lingual Dense
Passage Retrieval
- URL: http://arxiv.org/abs/2107.11976v1
- Date: Mon, 26 Jul 2021 06:02:54 GMT
- Title: One Question Answering Model for Many Languages with Cross-lingual Dense
Passage Retrieval
- Authors: Akari Asai, Xinyan Yu, Jungo Kasai, Hannaneh Hajishirzi
- Abstract summary: CORA is a Cross-lingual Open-Retrieval Answer Generation model.
It can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
- Score: 39.061900747689094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present CORA, a Cross-lingual Open-Retrieval Answer Generation model that
can answer questions across many languages even when language-specific
annotated data or knowledge sources are unavailable. We introduce a new dense
passage retrieval algorithm that is trained to retrieve documents across
languages for a question. Combined with a multilingual autoregressive
generation model, CORA answers directly in the target language without any
translation or in-language retrieval modules as used in prior work. We propose
an iterative training method that automatically extends annotated data
available only in high-resource languages to low-resource ones. Our results
show that CORA substantially outperforms the previous state of the art on
multilingual open question answering benchmarks across 26 languages, 9 of which
are unseen during training. Our analyses show the significance of cross-lingual
retrieval and generation in many languages, particularly under low-resource
settings.
Related papers
- mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval [61.17793165194077]
We introduce mFollowIR, a benchmark for measuring instruction-following ability in retrieval models.
We present results for both multilingual (XX-XX) and cross-lingual (En-XX) performance.
We see strong cross-lingual performance with English-based retrievers that trained using instructions, but find a notable drop in performance in the multilingual setting.
arXiv Detail & Related papers (2025-01-31T16:24:46Z) - Multilingual Open QA on the MIA Shared Task [0.04285555583808084]
Cross-lingual information retrieval (CLIR) can find relevant text in any language even when the query is posed in a different, possibly low-resource, language.
We propose a simple and effective re-ranking method for improving passage retrieval in open question answering.
arXiv Detail & Related papers (2025-01-07T21:43:09Z) - What are the limits of cross-lingual dense passage retrieval for low-resource languages? [23.88853455670863]
We analyze the capabilities of the multi-lingual Passage Retriever (mDPR) for extremely low-resource languages.
mDPR achieves success on multilingual open QA benchmarks across 26 languages, of which 9 were unseen during training.
We focus on two extremely low-resource languages for which mDPR performs poorly: Amharic and Khmer.
arXiv Detail & Related papers (2024-08-21T18:51:46Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual
Open-retrieval Question Answering System [16.89747171947662]
This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open-retrieval Question Answering (COQA)
In this challenging scenario, given an input question the system has to gather evidence documents from a multilingual pool and generate an answer in the language of the question.
We devised several approaches combining different model variants for three main components: Data Augmentation, Passage Retrieval, and Answer Generation.
arXiv Detail & Related papers (2022-05-30T10:31:08Z) - Pivot Through English: Reliably Answering Multilingual Questions without
Document Retrieval [4.4973334555746]
Existing methods for open-retrieval question answering in lower resource languages (LRLs) lag significantly behind English.
We formulate a task setup more realistic to available resources, that circumvents document retrieval to reliably transfer knowledge from English to lower resource languages.
Within this task setup we propose Reranked Maximal Inner Product Search (RM-MIPS), akin to semantic similarity retrieval over the English training set with reranking.
arXiv Detail & Related papers (2020-12-28T04:38:45Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z) - X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained
Language Models [103.75890012041366]
Language models (LMs) have proven surprisingly successful at capturing factual knowledge.
However, studies on LMs' factual representation ability have almost invariably been performed on English.
We create a benchmark of cloze-style probes for 23 typologically diverse languages.
arXiv Detail & Related papers (2020-10-13T05:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.