Intriguing Properties of Compression on Multilingual Models
- URL: http://arxiv.org/abs/2211.02738v1
- Date: Fri, 4 Nov 2022 20:28:01 GMT
- Title: Intriguing Properties of Compression on Multilingual Models
- Authors: Kelechi Ogueji, Orevaoghene Ahia, Gbemileke Onilude, Sebastian
Gehrmann, Sara Hooker and Julia Kreutzer
- Abstract summary: We propose a framework to characterize the impact of sparsifying multilingual pre-trained language models during fine-tuning.
Applying this framework to mBERT named entity recognition models across 40 languages, we find that compression confers several intriguing and previously unknown generalization properties.
- Score: 17.06142742945346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual models are often particularly dependent on scaling to generalize
to a growing number of languages. Compression techniques are widely relied upon
to reconcile the growth in model size with real world resource constraints, but
compression can have a disparate effect on model performance for low-resource
languages. It is thus crucial to understand the trade-offs between scale,
multilingualism, and compression. In this work, we propose an experimental
framework to characterize the impact of sparsifying multilingual pre-trained
language models during fine-tuning. Applying this framework to mBERT named
entity recognition models across 40 languages, we find that compression confers
several intriguing and previously unknown generalization properties. In
contrast to prior findings, we find that compression may improve model
robustness over dense models. We additionally observe that under certain
sparsification regimes compression may aid, rather than disproportionately
impact the performance of low-resource languages.
Related papers
- What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models [2.2871867623460216]
This paper investigates the effectiveness of pruning, knowledge distillation, and quantization on an exclusively low-resourced, small-data language model, AfriBERTa.
Through a battery of experiments, we assess the effects of compression on performance across several metrics beyond accuracy.
arXiv Detail & Related papers (2024-04-06T23:52:53Z) - Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind [14.433894552549337]
Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality.
This paper introduces Multilingual Brain Surgeon (MBS), a novel calibration data sampling method for multilingual LLMs compression.
MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets.
arXiv Detail & Related papers (2024-04-06T22:16:32Z) - Model Compression and Efficient Inference for Large Language Models: A
Survey [20.199282252344396]
Large language models have two prominent characteristics compared to smaller models.
The most notable aspect of large models is the very high cost associated with model finetuning or training.
Large models emphasize versatility and generalization rather than performance on a single task.
arXiv Detail & Related papers (2024-02-15T06:58:30Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - Too Brittle To Touch: Comparing the Stability of Quantization and
Distillation Towards Developing Lightweight Low-Resource MT Models [12.670354498961492]
State-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages.
Knowledge Distillation is one popular technique to develop competitive, lightweight models.
arXiv Detail & Related papers (2022-10-27T05:30:13Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - What Do Compressed Multilingual Machine Translation Models Forget? [102.50127671423752]
We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases.
We demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages.
arXiv Detail & Related papers (2022-05-22T13:54:44Z) - What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.