Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for
Cross-Lingual Machine Reading Comprehension
- URL: http://arxiv.org/abs/2311.06758v1
- Date: Sun, 12 Nov 2023 07:20:37 GMT
- Title: Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for
Cross-Lingual Machine Reading Comprehension
- Authors: Tingfeng Cao, Chengyu Wang, Chuanqi Tan, Jun Huang, Jinhui Zhu
- Abstract summary: X-STA is a new approach for cross-lingual machine reading comprehension.
We leverage an attentive teacher to subtly transfer the answer spans of the source language to the answer output space of the target.
A Gradient-Disentangled Knowledge Sharing technique is proposed as an improved cross-attention block.
- Score: 32.37236167127796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In cross-lingual language understanding, machine translation is often
utilized to enhance the transferability of models across languages, either by
translating the training data from the source language to the target, or from
the target to the source to aid inference. However, in cross-lingual machine
reading comprehension (MRC), it is difficult to perform a deep level of
assistance to enhance cross-lingual transfer because of the variation of answer
span positions in different languages. In this paper, we propose X-STA, a new
approach for cross-lingual MRC. Specifically, we leverage an attentive teacher
to subtly transfer the answer spans of the source language to the answer output
space of the target. A Gradient-Disentangled Knowledge Sharing technique is
proposed as an improved cross-attention block. In addition, we force the model
to learn semantic alignments from multiple granularities and calibrate the
model outputs with teacher guidance to enhance cross-lingual transferability.
Experiments on three multi-lingual MRC datasets show the effectiveness of our
method, outperforming state-of-the-art approaches.
Related papers
- CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment [38.35458193262633]
English-centric models are usually suboptimal in other languages.
We propose a novel approach called CrossIn, which utilizes a mixed composition of cross-lingual instruction tuning data.
arXiv Detail & Related papers (2024-04-18T06:20:50Z) - Promoting Generalized Cross-lingual Question Answering in Few-resource
Scenarios via Self-knowledge Distillation [2.2493846458264386]
We study cross-lingual transfer mainly focusing on the Generalized Cross-Lingual Transfer (G-XLT) task.
Our approach seeks to enhance cross-lingual QA transfer using a high-performing multilingual model trained on a large-scale dataset.
We introduce the novel mAP@k coefficients to fine-tune self-knowledge distillation loss.
arXiv Detail & Related papers (2023-09-29T10:54:59Z) - Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - A Simple and Effective Method to Improve Zero-Shot Cross-Lingual
Transfer Learning [6.329304732560936]
Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries.
We propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss.
arXiv Detail & Related papers (2022-10-18T15:36:53Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - X-METRA-ADA: Cross-lingual Meta-Transfer Learning Adaptation to Natural
Language Understanding and Question Answering [55.57776147848929]
We propose X-METRA-ADA, a cross-lingual MEta-TRAnsfer learning ADAptation approach for Natural Language Understanding (NLU)
Our approach adapts MAML, an optimization-based meta-learning approach, to learn to adapt to new languages.
We show that our approach outperforms naive fine-tuning, reaching competitive performance on both tasks for most languages.
arXiv Detail & Related papers (2021-04-20T00:13:35Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Enhancing Answer Boundary Detection for Multilingual Machine Reading
Comprehension [86.1617182312817]
We propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision.
A mixed Machine Reading task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs.
A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web.
arXiv Detail & Related papers (2020-04-29T10:44:00Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.