Transfer Learning and Distant Supervision for Multilingual Transformer
Models: A Study on African Languages
- URL: http://arxiv.org/abs/2010.03179v1
- Date: Wed, 7 Oct 2020 05:23:27 GMT
- Title: Transfer Learning and Distant Supervision for Multilingual Transformer
Models: A Study on African Languages
- Authors: Michael A. Hedderich, David Adelani, Dawei Zhu, Jesujoba Alabi, Udia
Markus, Dietrich Klakow
- Abstract summary: We study trends in performance for different amounts of available resources for the three African languages Hausa, isiXhosa and Yorub'a.
We show that in combination with transfer learning or distant supervision, these models can achieve with as little as 10 or 100 labeled sentences the same performance as baselines.
- Score: 20.92293429849952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual transformer models like mBERT and XLM-RoBERTa have obtained
great improvements for many NLP tasks on a variety of languages. However,
recent works also showed that results from high-resource languages could not be
easily transferred to realistic, low-resource scenarios. In this work, we study
trends in performance for different amounts of available resources for the
three African languages Hausa, isiXhosa and Yor\`ub\'a on both NER and topic
classification. We show that in combination with transfer learning or distant
supervision, these models can achieve with as little as 10 or 100 labeled
sentences the same performance as baselines with much more supervised training
data. However, we also find settings where this does not hold. Our discussions
and additional experiments on assumptions such as time and hardware
restrictions highlight challenges and opportunities in low-resource learning.
Related papers
- MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer [50.40191599304911]
We introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer.
In this paper, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language.
We show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines.
arXiv Detail & Related papers (2024-01-09T21:09:07Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - AfroLM: A Self-Active Learning-based Multilingual Pretrained Language
Model for 23 African Languages [0.021987601456703476]
We present AfroLM, a multilingual language model pretrained from scratch on 23 African languages.
AfroLM is pretrained on a dataset 14x smaller than existing baselines.
It is able to generalize well across various domains.
arXiv Detail & Related papers (2022-11-07T02:15:25Z) - Geographical Distance Is The New Hyperparameter: A Case Study Of Finding
The Optimal Pre-trained Language For English-isiZulu Machine Translation [0.0]
This study explores the potential benefits of transfer learning in an English-isiZulu translation framework.
We gathered results from 8 different language corpora, including one multi-lingual corpus, and saw that isiXa-isiZulu outperformed all languages.
We also derived a new coefficient, Nasir's Geographical Distance Coefficient (NGDC) which provides an easy selection of languages for the pre-trained models.
arXiv Detail & Related papers (2022-05-17T20:41:25Z) - Towards Best Practices for Training Multilingual Dense Retrieval Models [54.91016739123398]
We focus on the task of monolingual retrieval in a variety of typologically diverse languages using one such design.
Our study is organized as a "best practices" guide for training multilingual dense retrieval models.
arXiv Detail & Related papers (2022-04-05T17:12:53Z) - A Survey of Multilingual Models for Automatic Speech Recognition [6.657361001202456]
Cross-lingual transfer is an attractive solution to the problem of multilingual Automatic Speech Recognition.
Recent advances in Self Supervised Learning are opening up avenues for unlabeled speech data to be used in multilingual ASR models.
We present best practices for building multilingual models from research across diverse languages and techniques.
arXiv Detail & Related papers (2022-02-25T09:31:40Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Low-Resource Language Modelling of South African Languages [6.805575417034369]
We evaluate the performance of open-vocabulary language models on low-resource South African languages.
We evaluate different variants of n-gram models, feedforward neural networks, recurrent neural networks (RNNs) and Transformers on small-scale datasets.
Overall, well-regularized RNNs give the best performance across two isiZulu and one Sepedi datasets.
arXiv Detail & Related papers (2021-04-01T21:27:27Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.