Aligned Weight Regularizers for Pruning Pretrained Neural Networks
- URL: http://arxiv.org/abs/2204.01385v2
- Date: Tue, 5 Apr 2022 10:13:39 GMT
- Title: Aligned Weight Regularizers for Pruning Pretrained Neural Networks
- Authors: James O' Neill and Sourav Dutta and Haytham Assem
- Abstract summary: We show that there is a clear performance discrepancy in magnitude-based pruning when comparing standard supervised learning to the zero-shot setting.
We propose two weight regularizers that aim to maximize the alignment between units of pruned and unpruned networks.
- Score: 6.000551438232907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While various avenues of research have been explored for iterative pruning,
little is known what effect pruning has on zero-shot test performance and its
potential implications on the choice of pruning criteria. This pruning setup is
particularly important for cross-lingual models that implicitly learn alignment
between language representations during pretraining, which if distorted via
pruning, not only leads to poorer performance on language data used for
retraining but also on zero-shot languages that are evaluated.
In this work, we show that there is a clear performance discrepancy in
magnitude-based pruning when comparing standard supervised learning to the
zero-shot setting. From this finding, we propose two weight regularizers that
aim to maximize the alignment between units of pruned and unpruned networks to
mitigate alignment distortion in pruned cross-lingual models and perform well
for both non zero-shot and zero-shot settings.
We provide experimental results on cross-lingual tasks for the zero-shot
setting using XLM-RoBERTa$_{\mathrm{Base}}$, where we also find that pruning
has varying degrees of representational degradation depending on the language
corresponding to the zero-shot test set. This is also the first study that
focuses on cross-lingual language model compression.
Related papers
- Language-Independent Representations Improve Zero-Shot Summarization [18.46817967804773]
Finetuning pretrained models on downstream generation tasks often leads to catastrophic forgetting in zero-shot conditions.
In this work, we focus on summarization and tackle the problem through the lens of language-independent representations.
We first show naively finetuned models are highly language-specific in both output behavior and internal representations, resulting in poor zero-shot performance.
arXiv Detail & Related papers (2024-04-08T17:56:43Z) - Anti-LM Decoding for Zero-shot In-context Machine Translation [59.26037416204157]
This work introduces an Anti-Language Model objective with a decay factor designed to address the weaknesses of In-context Machine Translation.
We conduct experiments across 3 model types and sizes, 3 language directions, and for both greedy decoding and beam search.
arXiv Detail & Related papers (2023-11-14T17:09:43Z) - Narrowing the Gap between Zero- and Few-shot Machine Translation by
Matching Styles [53.92189950211852]
Large language models have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning.
In this paper, we investigate the factors contributing to this gap and find that this gap can largely be closed (for about 70%) by matching the writing styles of the target corpus.
arXiv Detail & Related papers (2023-11-04T03:18:45Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - On the Relation between Syntactic Divergence and Zero-Shot Performance [22.195133438732633]
We take the transfer of Universal Dependencies (UD) parsing from English to a diverse set of languages and conduct two sets of experiments.
We analyze zero-shot performance based on the extent to which English source edges are preserved in translation.
In both sets of experiments, our results suggest a strong relation between cross-lingual stability and zero-shot parsing performance.
arXiv Detail & Related papers (2021-10-09T21:09:21Z) - Rethinking Zero-shot Neural Machine Translation: From a Perspective of
Latent Variables [28.101782382170306]
We introduce a denoising autoencoder objective based on pivot language into traditional training objective to improve the translation accuracy on zero-shot directions.
We demonstrate that the proposed method is able to effectively eliminate the spurious correlations and significantly outperforms state-of-the-art methods with a remarkable performance.
arXiv Detail & Related papers (2021-09-10T07:18:53Z) - AmericasNLI: Evaluating Zero-shot Natural Language Understanding of
Pretrained Multilingual Models in Truly Low-resource Languages [75.08199398141744]
We present AmericasNLI, an extension of XNLI (Conneau et al.), to 10 indigenous languages of the Americas.
We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches.
We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%.
arXiv Detail & Related papers (2021-04-18T05:32:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.