Language Models are Few-shot Multilingual Learners
- URL: http://arxiv.org/abs/2109.07684v1
- Date: Thu, 16 Sep 2021 03:08:22 GMT
- Title: Language Models are Few-shot Multilingual Learners
- Authors: Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Rosanne Liu, Jason
Yosinski, Pascale Fung
- Abstract summary: We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
- Score: 66.11011385895195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: General-purpose language models have demonstrated impressive capabilities,
performing on par with state-of-the-art approaches on a range of downstream
natural language processing (NLP) tasks and benchmarks when inferring
instructions from very few examples. Here, we evaluate the multilingual skills
of the GPT and T5 models in conducting multi-class classification on
non-English languages without any parameter updates. We show that, given a few
English examples as context, pre-trained language models can predict not only
English test samples but also non-English ones. Finally, we find the in-context
few-shot cross-lingual prediction results of language models are significantly
better than random prediction, and they are competitive compared to the
existing state-of-the-art cross-lingual models.
Related papers
- Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models [11.421452042888523]
We compare different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques.
Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks.
arXiv Detail & Related papers (2024-08-26T16:29:13Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual
Transfer [81.5984433881309]
We introduce BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format.
BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer.
Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer.
arXiv Detail & Related papers (2023-05-24T08:06:33Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Cross-Lingual Fine-Grained Entity Typing [26.973783464706447]
We present a unified cross-lingual fine-grained entity typing model capable of handling over 100 languages.
We analyze this model's ability to generalize to languages and entities unseen during training.
arXiv Detail & Related papers (2021-10-15T03:22:30Z) - On the ability of monolingual models to learn language-agnostic
representations [2.604227467422371]
We show that monolingual models pretrained and finetuned on different languages achieve competitive performance.
For example, models pretrained on distant languages such as German and Portuguese perform similarly on English tasks.
arXiv Detail & Related papers (2021-09-04T22:09:44Z) - Specializing Multilingual Language Models: An Empirical Study [50.7526245872855]
Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks.
For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data.
arXiv Detail & Related papers (2021-06-16T18:13:55Z) - Probing Multilingual Language Models for Discourse [0.0]
We find that the XLM-RoBERTa family of models consistently show the best performance.
Our results also indicate that model distillation may hurt the ability of cross-lingual transfer of sentence representations.
arXiv Detail & Related papers (2021-06-09T06:34:21Z) - Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank [46.626315158735615]
Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties.
This presents a challenge for language varieties unfamiliar to these models, whose labeled emphand unlabeled data is too limited to train a monolingual model effectively.
We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings.
arXiv Detail & Related papers (2020-09-29T16:12:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.