Syntax-augmented Multilingual BERT for Cross-lingual Transfer
- URL: http://arxiv.org/abs/2106.02134v1
- Date: Thu, 3 Jun 2021 21:12:50 GMT
- Title: Syntax-augmented Multilingual BERT for Cross-lingual Transfer
- Authors: Wasi Uddin Ahmad, Haoran Li, Kai-Wei Chang, Yashar Mehdad
- Abstract summary: This work shows that explicitly providing language syntax and training mBERT helps cross-lingual transfer.
Experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks.
- Score: 37.99210035238424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, we have seen a colossal effort in pre-training multilingual
text encoders using large-scale corpora in many languages to facilitate
cross-lingual transfer learning. However, due to typological differences across
languages, the cross-lingual transfer is challenging. Nevertheless, language
syntax, e.g., syntactic dependencies, can bridge the typological gap. Previous
works have shown that pre-trained multilingual encoders, such as mBERT
\cite{devlin-etal-2019-bert}, capture language syntax, helping cross-lingual
transfer. This work shows that explicitly providing language syntax and
training mBERT using an auxiliary objective to encode the universal dependency
tree structure helps cross-lingual transfer. We perform rigorous experiments on
four NLP tasks, including text classification, question answering, named entity
recognition, and task-oriented semantic parsing. The experiment results show
that syntax-augmented mBERT improves cross-lingual transfer on popular
benchmarks, such as PAWS-X and MLQA, by 1.4 and 1.6 points on average across
all languages. In the \emph{generalized} transfer setting, the performance
boosted significantly, with 3.9 and 3.1 points on average in PAWS-X and MLQA.
Related papers
- Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Languages You Know Influence Those You Learn: Impact of Language
Characteristics on Multi-Lingual Text-to-Text Transfer [4.554080966463776]
Multi-lingual language models (LM) have been remarkably successful in enabling natural language tasks in low-resource languages.
We try to better understand how such models, specifically mT5, transfer *any* linguistic and semantic knowledge across languages.
A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer.
arXiv Detail & Related papers (2022-12-04T07:22:21Z) - How Do Multilingual Encoders Learn Cross-lingual Representation? [8.409283426564977]
Cross-lingual transfer benefits languages with little to no training data by transferring from other languages.
This thesis first shows such surprising cross-lingual effectiveness compared against prior art on various tasks.
We also look at how to inject different cross-lingual signals into multilingual encoders, and the optimization behavior of cross-lingual transfer with these models.
arXiv Detail & Related papers (2022-07-12T17:57:05Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - First Align, then Predict: Understanding the Cross-Lingual Ability of
Multilingual BERT [2.2931318723689276]
Cross-lingual transfer emerges from fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.
We show that multilingual BERT can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor.
While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be red during fine-tuning.
arXiv Detail & Related papers (2021-01-26T22:12:38Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.