LAReQA: Language-agnostic answer retrieval from a multilingual pool
- URL: http://arxiv.org/abs/2004.05484v1
- Date: Sat, 11 Apr 2020 20:51:11 GMT
- Title: LAReQA: Language-agnostic answer retrieval from a multilingual pool
- Authors: Uma Roy, Noah Constant, Rami Al-Rfou, Aditya Barua, Aaron Phillips,
Yinfei Yang
- Abstract summary: LAReQA tests for "strong" cross-lingual alignment.
We find that augmenting training data via machine translation is effective.
This finding underscores our claim that languageagnostic retrieval is a substantively new kind of cross-lingual evaluation.
- Score: 29.553907688813347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present LAReQA, a challenging new benchmark for language-agnostic answer
retrieval from a multilingual candidate pool. Unlike previous cross-lingual
tasks, LAReQA tests for "strong" cross-lingual alignment, requiring
semantically related cross-language pairs to be closer in representation space
than unrelated same-language pairs. Building on multilingual BERT (mBERT), we
study different strategies for achieving strong alignment. We find that
augmenting training data via machine translation is effective, and improves
significantly over using mBERT out-of-the-box. Interestingly, the embedding
baseline that performs the best on LAReQA falls short of competing baselines on
zero-shot variants of our task that only target "weak" alignment. This finding
underscores our claim that languageagnostic retrieval is a substantively new
kind of cross-lingual evaluation.
Related papers
- Synergistic Approach for Simultaneous Optimization of Monolingual, Cross-lingual, and Multilingual Information Retrieval [5.446052898856584]
This paper proposes a novel hybrid batch training strategy to improve zero-shot retrieval performance across monolingual, cross-lingual, and multilingual settings.
The approach fine-tunes multilingual language models using a mix of monolingual and cross-lingual question-answer pair batches sampled based on dataset size.
arXiv Detail & Related papers (2024-08-20T04:30:26Z) - Multilingual LLMs are Better Cross-lingual In-context Learners with
Alignment [24.742581572364124]
In-context learning (ICL) unfolds as large language models become capable of inferring test labels conditioned on a few labeled samples without any gradient update.
We provide the first in-depth analysis of ICL for cross-lingual text classification.
We propose a novel prompt construction strategy -- Cross-lingual In-context Source-Target Alignment (X-InSTA)
arXiv Detail & Related papers (2023-05-10T07:24:36Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages.
We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data.
Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z) - Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking [19.083300046605252]
Cross-language entity linking grounds mentions in multiple languages to a single-language knowledge base.
We find that the multilingual ability of BERT leads to robust performance in monolingual and multilingual settings.
arXiv Detail & Related papers (2020-10-19T20:08:26Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z) - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.