The Low-Resource Double Bind: An Empirical Study of Pruning for
Low-Resource Machine Translation
- URL: http://arxiv.org/abs/2110.03036v1
- Date: Wed, 6 Oct 2021 19:48:18 GMT
- Title: The Low-Resource Double Bind: An Empirical Study of Pruning for
Low-Resource Machine Translation
- Authors: Orevaoghene Ahia, Julia Kreutzer, Sara Hooker
- Abstract summary: "Bigger is better" explosion in number of parameters in deep neural networks has made it increasingly challenging to make state-of-the-art networks accessible in compute-restricted environments.
"Low-resource double bind" refers to co-occurrence of data limitations and compute resource constraints.
Our work offers surprising insights into the relationship between capacity and generalization in data-limited regimes for the task of machine translation.
- Score: 8.2987165990395
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A "bigger is better" explosion in the number of parameters in deep neural
networks has made it increasingly challenging to make state-of-the-art networks
accessible in compute-restricted environments. Compression techniques have
taken on renewed importance as a way to bridge the gap. However, evaluation of
the trade-offs incurred by popular compression techniques has been centered on
high-resource datasets. In this work, we instead consider the impact of
compression in a data-limited regime. We introduce the term low-resource double
bind to refer to the co-occurrence of data limitations and compute resource
constraints. This is a common setting for NLP for low-resource languages, yet
the trade-offs in performance are poorly studied. Our work offers surprising
insights into the relationship between capacity and generalization in
data-limited regimes for the task of machine translation. Our experiments on
magnitude pruning for translations from English into Yoruba, Hausa, Igbo and
German show that in low-resource regimes, sparsity preserves performance on
frequent sentences but has a disparate impact on infrequent ones. However, it
improves robustness to out-of-distribution shifts, especially for datasets that
are very distinct from the training distribution. Our findings suggest that
sparsity can play a beneficial role at curbing memorization of low frequency
attributes, and therefore offers a promising solution to the low-resource
double bind.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models [2.2871867623460216]
This paper investigates the effectiveness of pruning, knowledge distillation, and quantization on an exclusively low-resourced, small-data language model, AfriBERTa.
Through a battery of experiments, we assess the effects of compression on performance across several metrics beyond accuracy.
arXiv Detail & Related papers (2024-04-06T23:52:53Z) - Joint Dropout: Improving Generalizability in Low-Resource Neural Machine
Translation through Phrase Pair Variables [17.300004156754966]
We propose a method called Joint Dropout, that addresses the challenge of low-resource neural machine translation by substituting phrases with variables.
We observe a substantial improvement in translation quality for language pairs with minimal resources, as seen in BLEU and Direct Assessment scores.
arXiv Detail & Related papers (2023-07-24T14:33:49Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - Semi-supervised Neural Machine Translation with Consistency
Regularization for Low-Resource Languages [3.475371300689165]
This paper presents a simple yet effective method to tackle the problem for low-resource languages by augmenting high-quality sentence pairs and training NMT models in a semi-supervised manner.
Specifically, our approach combines the cross-entropy loss for supervised learning with KL Divergence for unsupervised fashion given pseudo and augmented target sentences.
Experimental results show that our approach significantly improves NMT baselines, especially on low-resource datasets with 0.46--2.03 BLEU scores.
arXiv Detail & Related papers (2023-04-02T15:24:08Z) - Towards Realistic Low-resource Relation Extraction: A Benchmark with
Empirical Baseline Study [51.33182775762785]
This paper presents an empirical study to build relation extraction systems in low-resource settings.
We investigate three schemes to evaluate the performance in low-resource settings: (i) different types of prompt-based methods with few-shot labeled data; (ii) diverse balancing methods to address the long-tailed distribution issue; and (iii) data augmentation technologies and self-training to generate more labeled in-domain data.
arXiv Detail & Related papers (2022-10-19T15:46:37Z) - What Do Compressed Multilingual Machine Translation Models Forget? [102.50127671423752]
We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases.
We demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages.
arXiv Detail & Related papers (2022-05-22T13:54:44Z) - Variational Information Bottleneck for Effective Low-Resource
Fine-Tuning [40.66716433803935]
We propose to use Variational Information Bottleneck (VIB) to suppress irrelevant features when fine-tuning on low-resource target tasks.
We show that our VIB model finds sentence representations that are more robust to biases in natural language inference datasets.
arXiv Detail & Related papers (2021-06-10T03:08:13Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z) - DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep
Learning [79.89085533866071]
This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors.
DeepReduce decomposes tensors in two sets, values and indices, and allows both independent and combined compression of these sets.
Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods.
arXiv Detail & Related papers (2021-02-05T11:31:24Z) - Establishing Baselines for Text Classification in Low-Resource Languages [0.0]
We introduce two previously unreleased datasets as benchmark datasets for text classification.
Second, we pretrain better BERT and DistilBERT models for use within the Filipino setting.
Third, we introduce a simple degradation test that benchmarks a model's resistance to performance degradation as the number of training samples are reduced.
arXiv Detail & Related papers (2020-05-05T11:17:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.