Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy
for Language Models
- URL: http://arxiv.org/abs/2310.13191v3
- Date: Thu, 11 Jan 2024 04:07:39 GMT
- Title: Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy
for Language Models
- Authors: Jianwei Li, Qi Lei, Wei Cheng, Dongkuan Xu
- Abstract summary: We introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models.
Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews.
- Score: 35.58379464827462
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The pruning objective has recently extended beyond accuracy and sparsity to
robustness in language models. Despite this, existing methods struggle to
enhance robustness against adversarial attacks when continually increasing
model sparsity and require a retraining process. As humans step into the era of
large language models, these issues become increasingly prominent. This paper
proposes that the robustness of language models is proportional to the extent
of pre-trained knowledge they encompass. Accordingly, we introduce a
post-training pruning strategy designed to faithfully replicate the embedding
space and feature space of dense language models, aiming to conserve more
pre-trained knowledge during the pruning process. In this setup, each layer's
reconstruction error not only originates from itself but also includes
cumulative error from preceding layers, followed by an adaptive rectification.
Compared to other state-of-art baselines, our approach demonstrates a superior
balance between accuracy, sparsity, robustness, and pruning cost with BERT on
datasets SST2, IMDB, and AGNews, marking a significant stride towards robust
pruning in language models.
Related papers
- Making Pre-trained Language Models Better Continual Few-Shot Relation
Extractors [15.417833307088637]
Continual Few-shot Relation Extraction (CFRE) is a practical problem that requires the model to continuously learn novel relations.
The primary challenges are catastrophic forgetting and overfitting.
This paper harnesses prompt learning to explore the implicit capabilities of pre-trained language models.
arXiv Detail & Related papers (2024-02-24T04:32:44Z) - UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models [12.45822383965784]
We introduce UnDIAL (Unlearning via Self-Distillation on Adjusted Logits), a novel and robust unlearning method.
Our approach leverages self-distillation to adjust logits and selectively reduce the influence of targeted tokens.
arXiv Detail & Related papers (2024-02-15T16:21:14Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for
Perturbation-Robust Slot Filling [27.602336774468]
Most existing slot filling models tend to memorize inherent patterns of entities and corresponding contexts from training data.
We propose a semantic awareness structure transferring method for training perturbation-robust slot filling models.
arXiv Detail & Related papers (2022-08-24T13:01:00Z) - Adversarial Self-Attention for Language Understanding [89.265747130584]
This paper proposes textitAdversarial Self-Attention mechanism (ASA).
ASA adversarially reconstructs the Transformer attentions and facilitates model training from contaminated model structures.
For fine-tuning, ASA-empowered models consistently outweigh naive models by a large margin considering both generalization and robustness.
arXiv Detail & Related papers (2022-06-25T09:18:10Z) - How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial
Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective.
RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process.
Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Cold-start Active Learning through Self-supervised Language Modeling [15.551710499866239]
Active learning aims to reduce annotation costs by choosing the most critical examples to label.
With BERT, we develop a simple strategy based on the masked language modeling loss.
Compared to other baselines, our approach reaches higher accuracy within less sampling iterations and time.
arXiv Detail & Related papers (2020-10-19T14:09:17Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.