Pivot Through English: Reliably Answering Multilingual Questions without
Document Retrieval
- URL: http://arxiv.org/abs/2012.14094v1
- Date: Mon, 28 Dec 2020 04:38:45 GMT
- Title: Pivot Through English: Reliably Answering Multilingual Questions without
Document Retrieval
- Authors: Ivan Montero, Shayne Longpre, Ni Lao, Andrew J. Frank, Christopher
DuBois
- Abstract summary: Existing methods for open-retrieval question answering in lower resource languages (LRLs) lag significantly behind English.
We formulate a task setup more realistic to available resources, that circumvents document retrieval to reliably transfer knowledge from English to lower resource languages.
Within this task setup we propose Reranked Maximal Inner Product Search (RM-MIPS), akin to semantic similarity retrieval over the English training set with reranking.
- Score: 4.4973334555746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing methods for open-retrieval question answering in lower resource
languages (LRLs) lag significantly behind English. They not only suffer from
the shortcomings of non-English document retrieval, but are reliant on
language-specific supervision for either the task or translation. We formulate
a task setup more realistic to available resources, that circumvents document
retrieval to reliably transfer knowledge from English to lower resource
languages. Assuming a strong English question answering model or database, we
compare and analyze methods that pivot through English: to map foreign queries
to English and then English answers back to target language answers. Within
this task setup we propose Reranked Multilingual Maximal Inner Product Search
(RM-MIPS), akin to semantic similarity retrieval over the English training set
with reranking, which outperforms the strongest baselines by 2.7% on XQuAD and
6.2% on MKQA. Analysis demonstrates the particular efficacy of this strategy
over state-of-the-art alternatives in challenging settings: low-resource
languages, with extensive distractor data and query distribution misalignment.
Circumventing retrieval, our analysis shows this approach offers rapid answer
generation to almost any language off-the-shelf, without the need for any
additional training data in the target language.
Related papers
- Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - What are the limits of cross-lingual dense passage retrieval for low-resource languages? [23.88853455670863]
We analyze the capabilities of the multi-lingual Passage Retriever (mDPR) for extremely low-resource languages.
mDPR achieves success on multilingual open QA benchmarks across 26 languages, of which 9 were unseen during training.
We focus on two extremely low-resource languages for which mDPR performs poorly: Amharic and Khmer.
arXiv Detail & Related papers (2024-08-21T18:51:46Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - Soft Prompt Decoding for Multilingual Dense Retrieval [30.766917713997355]
We show that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance.
This is due to the heterogeneous and imbalanced nature of multilingual collections.
We present KD-SPD, a novel soft prompt decoding approach for MLIR that implicitly "translates" the representation of documents in different languages into the same embedding space.
arXiv Detail & Related papers (2023-05-15T21:17:17Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - Cross-Lingual Training with Dense Retrieval for Document Retrieval [56.319511218754414]
We explore different transfer techniques for document ranking from English annotations to multiple non-English languages.
Experiments on the test collections in six languages (Chinese, Arabic, French, Hindi, Bengali, Spanish) from diverse language families.
We find that weakly-supervised target language transfer yields competitive performances against the generation-based target language transfer.
arXiv Detail & Related papers (2021-09-03T17:15:38Z) - One Question Answering Model for Many Languages with Cross-lingual Dense
Passage Retrieval [39.061900747689094]
CORA is a Cross-lingual Open-Retrieval Answer Generation model.
It can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
arXiv Detail & Related papers (2021-07-26T06:02:54Z) - Multilingual Answer Sentence Reranking via Automatically Translated Data [97.98885151955467]
We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems.
The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources.
arXiv Detail & Related papers (2021-02-20T03:52:08Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z) - LAReQA: Language-agnostic answer retrieval from a multilingual pool [29.553907688813347]
LAReQA tests for "strong" cross-lingual alignment.
We find that augmenting training data via machine translation is effective.
This finding underscores our claim that languageagnostic retrieval is a substantively new kind of cross-lingual evaluation.
arXiv Detail & Related papers (2020-04-11T20:51:11Z) - Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using
Zero-shot Learning [30.868309879441615]
We tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents.
Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish.
arXiv Detail & Related papers (2019-12-30T20:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.