Transferring BERT Capabilities from High-Resource to Low-Resource
Languages Using Vocabulary Matching
- URL: http://arxiv.org/abs/2402.14408v1
- Date: Thu, 22 Feb 2024 09:49:26 GMT
- Title: Transferring BERT Capabilities from High-Resource to Low-Resource
Languages Using Vocabulary Matching
- Authors: Piotr Rybak
- Abstract summary: This work presents a novel approach to transfer BERT capabilities from high-resource to low-resource languages using vocabulary matching.
We conduct experiments on the Silesian and Kashubian languages and demonstrate the effectiveness of our approach to improve the performance of BERT models even when the target language has minimal training data.
- Score: 1.746529892290768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models have revolutionized the natural language
understanding landscape, most notably BERT (Bidirectional Encoder
Representations from Transformers). However, a significant challenge remains
for low-resource languages, where limited data hinders the effective training
of such models. This work presents a novel approach to bridge this gap by
transferring BERT capabilities from high-resource to low-resource languages
using vocabulary matching. We conduct experiments on the Silesian and Kashubian
languages and demonstrate the effectiveness of our approach to improve the
performance of BERT models even when the target language has minimal training
data. Our results highlight the potential of the proposed technique to
effectively train BERT models for low-resource languages, thus democratizing
access to advanced language understanding models.
Related papers
- Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi [0.4194295877935868]
We present a study of PEFT methods for the Indic low-resource language Marathi.
These approaches are evaluated on prominent text classification datasets like MahaSent, MahaHate, and MahaNews.
We show that these methods are competitive with full fine-tuning and can be used without loss in accuracy.
arXiv Detail & Related papers (2024-08-06T13:16:16Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting [53.77590764277568]
We introduce a novel MoE-CT architecture that separates the base model's learning from the multilingual expansion process.
Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency.
arXiv Detail & Related papers (2024-06-25T11:03:45Z) - MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer [50.40191599304911]
We introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer.
In this paper, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language.
We show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines.
arXiv Detail & Related papers (2024-01-09T21:09:07Z) - Cross-lingual Data Augmentation for Document-grounded Dialog Systems in
Low Resource Languages [0.0]
We present a novel pipeline CLEM (Cross-Lingual Enhanced Model) including adversarial training retrieval (Retriever and Re-ranker) and Fid (fusion-in-decoder) generator.
To further leverage high-resource language, we also propose an innovative architecture to conduct alignment across different languages with translated training.
arXiv Detail & Related papers (2023-05-24T09:40:52Z) - Improving Cross-lingual Information Retrieval on Low-Resource Languages
via Optimal Transport Distillation [21.057178077747754]
In this work, we propose OPTICAL: Optimal Transport distillation for low-resource Cross-lingual information retrieval.
By separating the cross-lingual knowledge from knowledge of query document matching, OPTICAL only needs bitext data for distillation training.
Experimental results show that, with minimal training data, OPTICAL significantly outperforms strong baselines on low-resource languages.
arXiv Detail & Related papers (2023-01-29T22:30:36Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Fine-tuning BERT for Low-Resource Natural Language Understanding via
Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model.
Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model.
We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z) - What the [MASK]? Making Sense of Language-Specific BERT Models [39.54532211263058]
This paper presents the current state of the art in language-specific BERT models.
Our aim is to provide an overview of the commonalities and differences between Language-language-specific BERT models and mBERT models.
arXiv Detail & Related papers (2020-03-05T20:42:51Z) - Cross-lingual, Character-Level Neural Morphological Tagging [57.0020906265213]
We train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together.
Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.
arXiv Detail & Related papers (2017-08-30T08:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.