Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer
models
- URL: http://arxiv.org/abs/2111.14192v1
- Date: Sun, 28 Nov 2021 16:25:04 GMT
- Title: Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer
models
- Authors: Zein Shaheen, Gerhard Wohlgenannt, Dmitry Muromtsev
- Abstract summary: We study zero-shot cross-lingual transfer from English to French and German under Multi-Label Text Classification.
We extend EURLEX57K dataset, the English dataset for topic classification of legal documents, with French and German official translation.
We find that Language model finetuning of multi-lingual pre-trained model (M-DistilBERT, M-BERT) leads to 32.0-34.94%, 76.15-87.54% relative improvement on French and German test sets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Zero-shot cross-lingual transfer is an important feature in modern NLP models
and architectures to support low-resource languages. In this work, We study
zero-shot cross-lingual transfer from English to French and German under
Multi-Label Text Classification, where we train a classifier using English
training set, and we test using French and German test sets. We extend
EURLEX57K dataset, the English dataset for topic classification of legal
documents, with French and German official translation. We investigate the
effect of using some training techniques, namely Gradual Unfreezing and
Language Model finetuning, on the quality of zero-shot cross-lingual transfer.
We find that Language model finetuning of multi-lingual pre-trained model
(M-DistilBERT, M-BERT) leads to 32.0-34.94%, 76.15-87.54\% relative improvement
on French and German test sets correspondingly. Also, Gradual unfreezing of
pre-trained model's layers during training results in relative improvement of
38-45% for French and 58-70% for German. Compared to training a model in Joint
Training scheme using English, French and German training sets, zero-shot
BERT-based classification model reaches 86% of the performance achieved by
jointly-trained BERT-based classification model.
Related papers
- Multilingual Pretraining Using a Large Corpus Machine-Translated from a Single Source Language [34.54405113575568]
Machine-translated text from a single high-quality source language can contribute significantly to the pretraining of multilingual models.
We show that CuatroLLM matches or outperforms state-of-the-art multilingual models trained using closed data.
We release our corpus, models, and training pipeline under open licenses at hf.co/britllm/CuatroLLM.
arXiv Detail & Related papers (2024-10-31T14:09:50Z) - CroissantLLM: A Truly Bilingual French-English Language Model [42.03897426049679]
We introduce CroissantLLM, a 1.3B language model pretrained on a set of 3T English and French tokens.
We pioneer the approach of training an intrinsically bilingual model with a 1:1 English-to-French pretraining data ratio.
To assess performance outside of English, we craft a novel benchmark, FrenchBench.
arXiv Detail & Related papers (2024-02-01T17:17:55Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Language Contamination Explains the Cross-lingual Capabilities of
English Pretrained Models [79.38278330678965]
We find that common English pretraining corpora contain significant amounts of non-English text.
This leads to hundreds of millions of foreign language tokens in large-scale datasets.
We then demonstrate that even these small percentages of non-English data facilitate cross-lingual transfer for models trained on them.
arXiv Detail & Related papers (2022-04-17T23:56:54Z) - DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model.
vanilla embedding sharing in ELECTRA hurts training efficiency and model performance.
We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z) - Learning Compact Metrics for MT [21.408684470261342]
We investigate the trade-off between multilinguality and model capacity with RemBERT, a state-of-the-art multilingual language model.
We show that model size is indeed a bottleneck for cross-lingual transfer, then demonstrate how distillation can help addressing this bottleneck.
Our method yields up to 10.5% improvement over vanilla fine-tuning and reaches 92.6% of RemBERT's performance using only a third of its parameters.
arXiv Detail & Related papers (2021-10-12T20:39:35Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - AmericasNLI: Evaluating Zero-shot Natural Language Understanding of
Pretrained Multilingual Models in Truly Low-resource Languages [75.08199398141744]
We present AmericasNLI, an extension of XNLI (Conneau et al.), to 10 indigenous languages of the Americas.
We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches.
We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%.
arXiv Detail & Related papers (2021-04-18T05:32:28Z) - Multilingual BERT Post-Pretraining Alignment [26.62198329830013]
We propose a simple method to align multilingual contextual embeddings as a post-pretraining step.
Using parallel data, our method aligns embeddings on the word level through the recently proposed Translation Language Modeling objective.
We also perform sentence-level code-switching with English when fine on downstream tasks.
arXiv Detail & Related papers (2020-10-23T17:14:41Z) - Language-agnostic BERT Sentence Embedding [14.241717104817713]
We investigate methods for learning multilingual sentence embeddings by combining the best methods for learning monolingual and cross-lingual representations.
We show that introducing a pre-trained multilingual language model dramatically reduces the amount of parallel training data required to achieve good performance by 80%.
arXiv Detail & Related papers (2020-07-03T17:58:42Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.