Super Tickets in Pre-Trained Language Models: From Model Compression to
Improving Generalization
- URL: http://arxiv.org/abs/2105.12002v1
- Date: Tue, 25 May 2021 15:10:05 GMT
- Title: Super Tickets in Pre-Trained Language Models: From Model Compression to
Improving Generalization
- Authors: Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu,
Pengcheng He, Tuo Zhao and Weizhu Chen
- Abstract summary: We study such a collection of tickets, which is referred to as "winning tickets", in extremely over-parametrized models.
We observe that at certain compression ratios, generalization performance of the winning tickets can not only match, but also exceed that of the full model.
- Score: 65.23099004725461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Lottery Ticket Hypothesis suggests that an over-parametrized network
consists of "lottery tickets", and training a certain collection of them (i.e.,
a subnetwork) can match the performance of the full model. In this paper, we
study such a collection of tickets, which is referred to as "winning tickets",
in extremely over-parametrized models, e.g., pre-trained language models. We
observe that at certain compression ratios, generalization performance of the
winning tickets can not only match, but also exceed that of the full model. In
particular, we observe a phase transition phenomenon: As the compression ratio
increases, generalization performance of the winning tickets first improves
then deteriorates after a certain threshold. We refer to the tickets on the
threshold as "super tickets". We further show that the phase transition is task
and model dependent -- as model size becomes larger and training data set
becomes smaller, the transition becomes more pronounced. Our experiments on the
GLUE benchmark show that the super tickets improve single task fine-tuning by
$0.9$ points on BERT-base and $1.0$ points on BERT-large, in terms of
task-average score. We also demonstrate that adaptively sharing the super
tickets across tasks benefits multi-task learning.
Related papers
- COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of
Convolutional Neural Networks [5.956029437413275]
This research aims to generate winning lottery tickets from a set of lottery tickets that can achieve similar accuracy to the original unpruned network.
We introduce a novel winning ticket called Cyclic Overlapping Lottery Ticket (COLT) by data splitting and cyclic retraining of the pruned network from scratch.
arXiv Detail & Related papers (2022-12-24T16:38:59Z) - Robust Lottery Tickets for Pre-trained Language Models [57.14316619360376]
We propose a novel method based on learning binary weight masks to identify robust tickets hidden in the original language models.
Experimental results show the significant improvement of the proposed method over previous work on adversarial robustness evaluation.
arXiv Detail & Related papers (2022-11-06T02:59:27Z) - Data-Efficient Double-Win Lottery Tickets from Robust Pre-training [129.85939347733387]
We introduce Double-Win Lottery Tickets, in which a subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks.
We find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts.
arXiv Detail & Related papers (2022-06-09T20:52:50Z) - Playing Lottery Tickets with Vision and Language [62.6420670250559]
Large-scale transformer-based pre-training has revolutionized vision-and-language (V+L) research.
In parallel, work on the lottery ticket hypothesis has shown that deep neural networks contain small matchingworks that can achieve on par or even better performance than the dense networks when trained in isolation.
We use UNITER, one of the best-performing V+L models, as the testbed, and consolidate 7 representative V+L tasks for experiments.
arXiv Detail & Related papers (2021-04-23T22:24:33Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - Winning Lottery Tickets in Deep Generative Models [64.79920299421255]
We show the existence of winning tickets in deep generative models such as GANs and VAEs.
We also demonstrate the transferability of winning tickets across different generative models.
arXiv Detail & Related papers (2020-10-05T21:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.