Parameter-Efficient Masking Networks
- URL: http://arxiv.org/abs/2210.06699v1
- Date: Thu, 13 Oct 2022 03:39:03 GMT
- Title: Parameter-Efficient Masking Networks
- Authors: Yue Bai, Huan Wang, Xu Ma, Yitian Zhang, Zhiqiang Tao, Yun Fu
- Abstract summary: Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
- Score: 61.43995077575439
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A deeper network structure generally handles more complicated non-linearity
and performs more competitively. Nowadays, advanced network designs often
contain a large number of repetitive structures (e.g., Transformer). They
empower the network capacity to a new level but also increase the model size
inevitably, which is unfriendly to either model restoring or transferring. In
this study, we are the first to investigate the representative potential of
fixed random weights with limited unique values by learning diverse masks and
introduce the Parameter-Efficient Masking Networks (PEMN). It also naturally
leads to a new paradigm for model compression to diminish the model size.
Concretely, motivated by the repetitive structures in modern neural networks,
we utilize one random initialized layer, accompanied with different masks, to
convey different feature mappings and represent repetitive network modules.
Therefore, the model can be expressed as \textit{one-layer} with a bunch of
masks, which significantly reduce the model storage cost. Furthermore, we
enhance our strategy by learning masks for a model filled by padding a given
random weights vector. In this way, our method can further lower the space
complexity, especially for models without many repetitive architectures. We
validate the potential of PEMN learning masks on random weights with limited
unique values and test its effectiveness for a new compression paradigm based
on different network architectures. Code is available at
https://github.com/yueb17/PEMN
Related papers
- Randomly Initialized Subnetworks with Iterative Weight Recycling [0.0]
Multi-Prize Lottery Ticket Hypothesis posits that randomly neural networks contain severalworks that achieve comparable accuracy to fully trained models of the same architecture.
We propose a modification to two state-of-the-art algorithms that finds high-accuracyworks with no additional storage cost or scaling.
arXiv Detail & Related papers (2023-03-28T13:12:00Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - PRANC: Pseudo RAndom Networks for Compacting deep models [22.793523211040682]
PRANC enables significant compaction of a deep model.
In this study, we employ PRANC to condense image classification models and compress images by compacting their associated implicit neural networks.
arXiv Detail & Related papers (2022-06-16T22:03:35Z) - Automatic Sparse Connectivity Learning for Neural Networks [4.875787559251317]
Well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources.
In this work, we propose a new automatic pruning method - Sparse Connectivity Learning.
Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
arXiv Detail & Related papers (2022-01-13T15:12:48Z) - Mask Attention Networks: Rethinking and Strengthen Transformer [70.95528238937861]
Transformer is an attention-based neural network, which consists of two sublayers, Self-Attention Network (SAN) and Feed-Forward Network (FFN)
arXiv Detail & Related papers (2021-03-25T04:07:44Z) - KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning [49.77278179376902]
Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as textitcatastrophic forgetting.
Recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets.
We propose a new training method called textit- Kernel-wise Soft Mask (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task.
arXiv Detail & Related papers (2020-09-11T21:48:39Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.