Multilingual Answer Sentence Reranking via Automatically Translated Data
- URL: http://arxiv.org/abs/2102.10250v1
- Date: Sat, 20 Feb 2021 03:52:08 GMT
- Title: Multilingual Answer Sentence Reranking via Automatically Translated Data
- Authors: Thuy Vu and Alessandro Moschitti
- Abstract summary: We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems.
The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources.
- Score: 97.98885151955467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a study on the design of multilingual Answer Sentence Selection
(AS2) models, which are a core component of modern Question Answering (QA)
systems. The main idea is to transfer data, created from one resource rich
language, e.g., English, to other languages, less rich in terms of resources.
The main findings of this paper are: (i) the training data for AS2 translated
into a target language can be used to effectively fine-tune a Transformer-based
model for that language; (ii) one multilingual Transformer model it is enough
to rank answers in multiple languages; and (iii) mixed-language question/answer
pairs can be used to fine-tune models to select answers from any language,
where the input question is just in one language. This highly reduces the
complexity and technical requirement of a multilingual QA system. Our
experiments validate the findings above, showing a modest drop, at most 3%,
with respect to the state-of-the-art English model.
Related papers
- MST5 -- Multilingual Question Answering over Knowledge Graphs [1.6470999044938401]
Knowledge Graph Question Answering (KGQA) simplifies querying vast amounts of knowledge stored in a graph-based model using natural language.
Existing multilingual KGQA systems face challenges in achieving performance comparable to English systems.
We propose a simplified approach to enhance multilingual KGQA systems by incorporating linguistic context and entity information directly into the processing pipeline of a language model.
arXiv Detail & Related papers (2024-07-08T15:37:51Z) - Evaluating and Modeling Attribution for Cross-Lingual Question Answering [80.4807682093432]
This work is the first to study attribution for cross-lingual question answering.
We collect data in 5 languages to assess the attribution level of a state-of-the-art cross-lingual QA system.
We find that a substantial portion of the answers is not attributable to any retrieved passages.
arXiv Detail & Related papers (2023-05-23T17:57:46Z) - MuCoT: Multilingual Contrastive Training for Question-Answering in
Low-resource Languages [4.433842217026879]
Multi-lingual BERT-based models (mBERT) are often used to transfer knowledge from high-resource languages to low-resource languages.
We augment the QA samples of the target language using translation and transliteration into other languages and use the augmented data to fine-tune an mBERT-based QA model.
Experiments on the Google ChAII dataset show that fine-tuning the mBERT model with translations from the same language family boosts the question-answering performance.
arXiv Detail & Related papers (2022-04-12T13:52:54Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - xGQA: Cross-Lingual Visual Question Answering [100.35229218735938]
xGQA is a new multilingual evaluation benchmark for the visual question answering task.
We extend the established English GQA dataset to 7 typologically diverse languages.
We propose new adapter-based approaches to adapt multimodal transformer-based models to become multilingual.
arXiv Detail & Related papers (2021-09-13T15:58:21Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.