Promoting Generalized Cross-lingual Question Answering in Few-resource
Scenarios via Self-knowledge Distillation
- URL: http://arxiv.org/abs/2309.17134v1
- Date: Fri, 29 Sep 2023 10:54:59 GMT
- Title: Promoting Generalized Cross-lingual Question Answering in Few-resource
Scenarios via Self-knowledge Distillation
- Authors: Casimiro Pio Carrino, Carlos Escolano, Jos\'e A. R. Fonollosa
- Abstract summary: We study cross-lingual transfer mainly focusing on the Generalized Cross-Lingual Transfer (G-XLT) task.
Our approach seeks to enhance cross-lingual QA transfer using a high-performing multilingual model trained on a large-scale dataset.
We introduce the novel mAP@k coefficients to fine-tune self-knowledge distillation loss.
- Score: 2.2493846458264386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite substantial progress in multilingual extractive Question Answering
(QA), models with high and uniformly distributed performance across languages
remain challenging, especially for languages with limited resources. We study
cross-lingual transfer mainly focusing on the Generalized Cross-Lingual
Transfer (G-XLT) task, where the question language differs from the context
language - a challenge that has received limited attention thus far. Our
approach seeks to enhance cross-lingual QA transfer using a high-performing
multilingual model trained on a large-scale dataset, complemented by a few
thousand aligned QA examples across languages. Our proposed strategy combines
cross-lingual sampling and advanced self-distillation training in generations
to tackle the previous challenge. Notably, we introduce the novel mAP@k
coefficients to fine-tune self-knowledge distillation loss, dynamically
regulating the teacher's model knowledge to perform a balanced and effective
knowledge transfer. We extensively evaluate our approach to assess XLT and
G-XLT capabilities in extractive QA. Results reveal that our self-knowledge
distillation approach outperforms standard cross-entropy fine-tuning by a
significant margin. Importantly, when compared to a strong baseline that
leverages a sizeable volume of machine-translated data, our approach shows
competitive results despite the considerable challenge of operating within
resource-constrained settings, even in zero-shot scenarios. Beyond performance
improvements, we offer valuable insights through comprehensive analyses and an
ablation study, further substantiating the benefits and constraints of our
approach. In essence, we propose a practical solution to improve cross-lingual
QA transfer by leveraging a few data resources in an efficient way.
Related papers
- Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models [104.96990850774566]
We propose a Multi-lingual Ability Extraction and Transfer approach, named as MAET.
Our key idea is to decompose and extract language-agnostic ability-related weights from large language models.
Experiment results show MAET can effectively and efficiently extract and transfer the advanced abilities, and outperform training-based baseline methods.
arXiv Detail & Related papers (2024-10-10T11:23:18Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment [38.35458193262633]
English-centric models are usually suboptimal in other languages.
We propose a novel approach called CrossIn, which utilizes a mixed composition of cross-lingual instruction tuning data.
arXiv Detail & Related papers (2024-04-18T06:20:50Z) - Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in
Multilingual Language Models [12.662039551306632]
We show that observed high performance of multilingual models can be largely attributed to factors not requiring the transfer of actual linguistic knowledge.
More specifically, we observe what has been transferred across languages is mostly data artifacts and biases, especially for low-resource languages.
arXiv Detail & Related papers (2024-02-03T09:41:52Z) - Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for
Cross-Lingual Machine Reading Comprehension [32.37236167127796]
X-STA is a new approach for cross-lingual machine reading comprehension.
We leverage an attentive teacher to subtly transfer the answer spans of the source language to the answer output space of the target.
A Gradient-Disentangled Knowledge Sharing technique is proposed as an improved cross-attention block.
arXiv Detail & Related papers (2023-11-12T07:20:37Z) - Cross-lingual QA: A Key to Unlocking In-context Cross-lingual Performance [2.371686365695081]
Cross-lingual QA is a cross-lingual prompting method that translates only the question and answer parts, thus reducing translation costs.
Experiments on four typologically diverse multilingual benchmarks show that Cross-lingual QA effectively stimulates models to elicit their cross-lingual knowledge.
We show that prompting open-source MLLMs with cross-lingual in-context examples enhances performance as the model scale increases.
arXiv Detail & Related papers (2023-05-24T15:14:49Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - X-METRA-ADA: Cross-lingual Meta-Transfer Learning Adaptation to Natural
Language Understanding and Question Answering [55.57776147848929]
We propose X-METRA-ADA, a cross-lingual MEta-TRAnsfer learning ADAptation approach for Natural Language Understanding (NLU)
Our approach adapts MAML, an optimization-based meta-learning approach, to learn to adapt to new languages.
We show that our approach outperforms naive fine-tuning, reaching competitive performance on both tasks for most languages.
arXiv Detail & Related papers (2021-04-20T00:13:35Z) - Enhancing Answer Boundary Detection for Multilingual Machine Reading
Comprehension [86.1617182312817]
We propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision.
A mixed Machine Reading task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs.
A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web.
arXiv Detail & Related papers (2020-04-29T10:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.