Pivot Through English: Reliably Answering Multilingual Questions without
Document Retrieval
- URL: http://arxiv.org/abs/2012.14094v1
- Date: Mon, 28 Dec 2020 04:38:45 GMT
- Title: Pivot Through English: Reliably Answering Multilingual Questions without
Document Retrieval
- Authors: Ivan Montero, Shayne Longpre, Ni Lao, Andrew J. Frank, Christopher
DuBois
- Abstract summary: Existing methods for open-retrieval question answering in lower resource languages (LRLs) lag significantly behind English.
We formulate a task setup more realistic to available resources, that circumvents document retrieval to reliably transfer knowledge from English to lower resource languages.
Within this task setup we propose Reranked Maximal Inner Product Search (RM-MIPS), akin to semantic similarity retrieval over the English training set with reranking.
- Score: 4.4973334555746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing methods for open-retrieval question answering in lower resource
languages (LRLs) lag significantly behind English. They not only suffer from
the shortcomings of non-English document retrieval, but are reliant on
language-specific supervision for either the task or translation. We formulate
a task setup more realistic to available resources, that circumvents document
retrieval to reliably transfer knowledge from English to lower resource
languages. Assuming a strong English question answering model or database, we
compare and analyze methods that pivot through English: to map foreign queries
to English and then English answers back to target language answers. Within
this task setup we propose Reranked Multilingual Maximal Inner Product Search
(RM-MIPS), akin to semantic similarity retrieval over the English training set
with reranking, which outperforms the strongest baselines by 2.7% on XQuAD and
6.2% on MKQA. Analysis demonstrates the particular efficacy of this strategy
over state-of-the-art alternatives in challenging settings: low-resource
languages, with extensive distractor data and query distribution misalignment.
Circumventing retrieval, our analysis shows this approach offers rapid answer
generation to almost any language off-the-shelf, without the need for any
additional training data in the target language.
Related papers
- Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [71.12193680015622]
We tackle the challenge of extending process reward models (PRMs) to multilingual settings.
We train multilingual PRMs on a dataset spanning seven languages, which is translated from English.
Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
arXiv Detail & Related papers (2025-02-18T09:11:44Z) - mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval [61.17793165194077]
We introduce mFollowIR, a benchmark for measuring instruction-following ability in retrieval models.
We present results for both multilingual (XX-XX) and cross-lingual (En-XX) performance.
We see strong cross-lingual performance with English-based retrievers that trained using instructions, but find a notable drop in performance in the multilingual setting.
arXiv Detail & Related papers (2025-01-31T16:24:46Z) - Multilingual Open QA on the MIA Shared Task [0.04285555583808084]
Cross-lingual information retrieval (CLIR) can find relevant text in any language even when the query is posed in a different, possibly low-resource, language.
We propose a simple and effective re-ranking method for improving passage retrieval in open question answering.
arXiv Detail & Related papers (2025-01-07T21:43:09Z) - PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks [57.86928556668849]
Large Language Models (LLMs) have recently demonstrated impressive few-shot learning capabilities through in-context learning (ICL)
ICL performance is highly dependent on the choice of few-shot demonstrations, making the selection of the most optimal examples a persistent research challenge.
In this work, we propose PromptRefine, a novel Alternating Minimization approach for example selection that improves ICL performance on low-resource Indic languages.
arXiv Detail & Related papers (2024-12-07T17:51:31Z) - Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness [30.00463676754559]
We introduce BordIRLines, a benchmark consisting of 720 territorial dispute queries paired with 14k Wikipedia documents across 49 languages.
Our experiments reveal that retrieving multilingual documents best improves response consistency and decreases geopolitical bias over using purely in-language documents.
Our further experiments and case studies investigate how cross-lingual RAG is affected by aspects from IR to document contents.
arXiv Detail & Related papers (2024-10-02T01:59:07Z) - What are the limits of cross-lingual dense passage retrieval for low-resource languages? [23.88853455670863]
We analyze the capabilities of the multi-lingual Passage Retriever (mDPR) for extremely low-resource languages.
mDPR achieves success on multilingual open QA benchmarks across 26 languages, of which 9 were unseen during training.
We focus on two extremely low-resource languages for which mDPR performs poorly: Amharic and Khmer.
arXiv Detail & Related papers (2024-08-21T18:51:46Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - Cross-Lingual Training with Dense Retrieval for Document Retrieval [56.319511218754414]
We explore different transfer techniques for document ranking from English annotations to multiple non-English languages.
Experiments on the test collections in six languages (Chinese, Arabic, French, Hindi, Bengali, Spanish) from diverse language families.
We find that weakly-supervised target language transfer yields competitive performances against the generation-based target language transfer.
arXiv Detail & Related papers (2021-09-03T17:15:38Z) - One Question Answering Model for Many Languages with Cross-lingual Dense
Passage Retrieval [39.061900747689094]
CORA is a Cross-lingual Open-Retrieval Answer Generation model.
It can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
arXiv Detail & Related papers (2021-07-26T06:02:54Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z) - Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using
Zero-shot Learning [30.868309879441615]
We tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents.
Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish.
arXiv Detail & Related papers (2019-12-30T20:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.