Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using
Multilingual BERT
- URL: http://arxiv.org/abs/2205.08497v1
- Date: Tue, 17 May 2022 17:12:19 GMT
- Title: Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using
Multilingual BERT
- Authors: Beiduo Chen, Wu Guo, Quan Liu, Kun Tao
- Abstract summary: Multilingual BERT (mBERT), a language model pre-trained on large multilingual corpora, has impressive zero-shot cross-lingual transfer capabilities.
In this work, we explore the complementary property of lower layers to the last transformer layer of mBERT.
A feature aggregation module based on an attention mechanism is proposed to fuse the information in different layers of mBERT.
- Score: 16.22182090626537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual BERT (mBERT), a language model pre-trained on large multilingual
corpora, has impressive zero-shot cross-lingual transfer capabilities and
performs surprisingly well on zero-shot POS tagging and Named Entity
Recognition (NER), as well as on cross-lingual model transfer. At present, the
mainstream methods to solve the cross-lingual downstream tasks are always using
the last transformer layer's output of mBERT as the representation of
linguistic information. In this work, we explore the complementary property of
lower layers to the last transformer layer of mBERT. A feature aggregation
module based on an attention mechanism is proposed to fuse the information
contained in different layers of mBERT. The experiments are conducted on four
zero-shot cross-lingual transfer datasets, and the proposed method obtains
performance improvements on key multilingual benchmark tasks XNLI (+1.5 %),
PAWS-X (+2.4 %), NER (+1.2 F1), and POS (+1.5 F1). Through the analysis of the
experimental results, we prove that the layers before the last layer of mBERT
can provide extra useful information for cross-lingual downstream tasks and
explore the interpretability of mBERT empirically.
Related papers
- Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction [18.926993352330797]
We propose TransFusion, a framework in which models are fine-tuned to use English translations of low-resource language data.
GoLLIE-TF, a cross-lingual instruction-tuned LLM for IE tasks, is designed to close the performance gap between high and low-resource languages.
arXiv Detail & Related papers (2023-05-23T01:23:22Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - Learning Compact Metrics for MT [21.408684470261342]
We investigate the trade-off between multilinguality and model capacity with RemBERT, a state-of-the-art multilingual language model.
We show that model size is indeed a bottleneck for cross-lingual transfer, then demonstrate how distillation can help addressing this bottleneck.
Our method yields up to 10.5% improvement over vanilla fine-tuning and reaches 92.6% of RemBERT's performance using only a third of its parameters.
arXiv Detail & Related papers (2021-10-12T20:39:35Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Bilingual Alignment Pre-training for Zero-shot Cross-lingual Transfer [33.680292990007366]
In this paper, we aim to improve the zero-shot cross-lingual transfer performance by aligning the embeddings better.
We propose a pre-training task named Alignment Language Model (AlignLM) which uses the statistical alignment information as the prior knowledge to guide bilingual word prediction.
The results show AlignLM can improve the zero-shot performance significantly on MLQA and XNLI datasets.
arXiv Detail & Related papers (2021-06-03T10:18:43Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.