Breaking through Deterministic Barriers: Randomized Pruning Mask
Generation and Selection
- URL: http://arxiv.org/abs/2310.13183v2
- Date: Thu, 11 Jan 2024 04:17:28 GMT
- Title: Breaking through Deterministic Barriers: Randomized Pruning Mask
Generation and Selection
- Authors: Jianwei Li, Weizhi Gao, Qi Lei, Dongkuan Xu
- Abstract summary: We train a large model and then remove its redundant neurons or weights by pruning.
This approach achieves state-of-the-art performance across eight datasets from GLUE.
- Score: 29.375460634415806
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: It is widely acknowledged that large and sparse models have higher accuracy
than small and dense models under the same model size constraints. This
motivates us to train a large model and then remove its redundant neurons or
weights by pruning. Most existing works pruned the networks in a deterministic
way, the performance of which solely depends on a single pruning criterion and
thus lacks variety. Instead, in this paper, we propose a model pruning strategy
that first generates several pruning masks in a designed random way.
Subsequently, along with an effective mask-selection rule, the optimal mask is
chosen from the pool of mask candidates. To further enhance efficiency, we
introduce an early mask evaluation strategy, mitigating the overhead associated
with training multiple masks. Our extensive experiments demonstrate that this
approach achieves state-of-the-art performance across eight datasets from GLUE,
particularly excelling at high levels of sparsity.
Related papers
- Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning [17.638387297838936]
Fine-tuning large language models (LLM) can be costly.
PEFT addresses the problems by training a fraction of the parameters, whose success reveals the expressiveness and flexibility of pretrained models.
This paper studies the limit of PEFT, by further simplifying its design and reducing the number of trainable parameters beyond standard setups.
We show that Random Masking is surprisingly effective: with a larger-than-expected learning rate, Random Masking can match the performance of standard PEFT algorithms on various tasks, using fewer trainable parameters.
arXiv Detail & Related papers (2024-05-04T07:44:18Z) - Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM [55.93697196726016]
We propose a simple yet effective crowd counting method by utilizing the Segment-Everything-Everywhere Model (SEEM)
We show that SEEM's performance in dense crowd scenes is limited, primarily due to the omission of many persons in high-density areas.
Our proposed method achieves the best unsupervised performance in crowd counting, while also being comparable to some supervised methods.
arXiv Detail & Related papers (2024-02-27T13:55:17Z) - Mask Transfiner for High-Quality Instance Segmentation [95.74244714914052]
We present Mask Transfiner for high-quality and efficient instance segmentation.
Our approach only processes detected error-prone tree nodes and self-corrects their errors in parallel.
Our code and trained models will be available at http://vis.xyz/pub/transfiner.
arXiv Detail & Related papers (2021-11-26T18:58:22Z) - Masksembles for Uncertainty Estimation [60.400102501013784]
Deep neural networks have amply demonstrated their prowess but estimating the reliability of their predictions remains challenging.
Deep Ensembles are widely considered as being one of the best methods for generating uncertainty estimates but are very expensive to train and evaluate.
MC-Dropout is another popular alternative, which is less expensive, but also less reliable.
arXiv Detail & Related papers (2020-12-15T14:39:57Z) - Investigating and Simplifying Masking-based Saliency Methods for Model
Interpretability [5.387323728379395]
Saliency maps that identify the most informative regions of an image are valuable for model interpretability.
A common approach to creating saliency maps involves generating input masks that mask out portions of an image.
We show that a masking model can be trained with as few as 10 examples per class and still generate saliency maps with only a 0.7-point increase in localization error.
arXiv Detail & Related papers (2020-10-19T18:00:36Z) - Improving Self-supervised Pre-training via a Fully-Explored Masked
Language Model [57.77981008219654]
Masked Language Model (MLM) framework has been widely adopted for self-supervised language pre-training.
We propose a fully-explored masking strategy, where a text sequence is divided into a certain number of non-overlapping segments.
arXiv Detail & Related papers (2020-10-12T21:28:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.