What Do Compressed Multilingual Machine Translation Models Forget?
- URL: http://arxiv.org/abs/2205.10828v4
- Date: Tue, 27 Jun 2023 09:34:34 GMT
- Title: What Do Compressed Multilingual Machine Translation Models Forget?
- Authors: Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline
Brun, James Henderson, Laurent Besacier
- Abstract summary: We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases.
We demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages.
- Score: 102.50127671423752
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, very large pre-trained models achieve state-of-the-art results in
various natural language processing (NLP) tasks, but their size makes it more
challenging to apply them in resource-constrained environments. Compression
techniques allow to drastically reduce the size of the models and therefore
their inference time with negligible impact on top-tier metrics. However, the
general performance averaged across multiple tasks and/or languages may hide a
drastic performance drop on under-represented features, which could result in
the amplification of biases encoded by the models. In this work, we assess the
impact of compression methods on Multilingual Neural Machine Translation models
(MNMT) for various language groups, gender, and semantic biases by extensive
analysis of compressed models on different machine translation benchmarks, i.e.
FLORES-101, MT-Gender, and DiBiMT. We show that the performance of
under-represented languages drops significantly, while the average BLEU metric
only slightly decreases. Interestingly, the removal of noisy memorization with
compression leads to a significant improvement for some medium-resource
languages. Finally, we demonstrate that compression amplifies intrinsic gender
and semantic biases, even in high-resource languages. Code:
https://github.com/alirezamshi/bias-compressedMT
Related papers
- What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models [2.2871867623460216]
This paper investigates the effectiveness of pruning, knowledge distillation, and quantization on an exclusively low-resourced, small-data language model, AfriBERTa.
Through a battery of experiments, we assess the effects of compression on performance across several metrics beyond accuracy.
arXiv Detail & Related papers (2024-04-06T23:52:53Z) - Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind [14.433894552549337]
Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality.
This paper introduces Multilingual Brain Surgeon (MBS), a novel calibration data sampling method for multilingual LLMs compression.
MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets.
arXiv Detail & Related papers (2024-04-06T22:16:32Z) - Intriguing Properties of Compression on Multilingual Models [17.06142742945346]
We propose a framework to characterize the impact of sparsifying multilingual pre-trained language models during fine-tuning.
Applying this framework to mBERT named entity recognition models across 40 languages, we find that compression confers several intriguing and previously unknown generalization properties.
arXiv Detail & Related papers (2022-11-04T20:28:01Z) - Too Brittle To Touch: Comparing the Stability of Quantization and
Distillation Towards Developing Lightweight Low-Resource MT Models [12.670354498961492]
State-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages.
Knowledge Distillation is one popular technique to develop competitive, lightweight models.
arXiv Detail & Related papers (2022-10-27T05:30:13Z) - Compression of Generative Pre-trained Language Models via Quantization [62.80110048377957]
We find that previous quantization methods fail on generative tasks due to the textithomogeneous word embeddings
We propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules.
arXiv Detail & Related papers (2022-03-21T02:11:35Z) - What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.