Building Variable-sized Models via Learngene Pool
- URL: http://arxiv.org/abs/2312.05743v2
- Date: Tue, 12 Dec 2023 03:56:33 GMT
- Title: Building Variable-sized Models via Learngene Pool
- Authors: Boyu Shi, Shiyu Xia, Xu Yang, Haokun Chen, Zhiqiang Kou, Xin Geng
- Abstract summary: Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for building numerous networks with different complexity and performance trade-offs.
SN-Net faces challenges to build smaller models for low resource constraints.
We propose a novel method called Learngene Pool to overcome these challenges.
- Score: 39.99697115082106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some
pre-trained networks for quickly building numerous networks with different
complexity and performance trade-offs. In this way, the burdens of designing or
training the variable-sized networks, which can be used in application
scenarios with diverse resource constraints, are alleviated. However, SN-Net
still faces a few challenges. 1) Stitching from multiple independently
pre-trained anchors introduces high storage resource consumption. 2) SN-Net
faces challenges to build smaller models for low resource constraints. 3).
SN-Net uses an unlearned initialization method for stitch layers, limiting the
final performance. To overcome these challenges, motivated by the recently
proposed Learngene framework, we propose a novel method called Learngene Pool.
Briefly, Learngene distills the critical knowledge from a large pre-trained
model into a small part (termed as learngene) and then expands this small part
into a few variable-sized models. In our proposed method, we distill one
pretrained large model into multiple small models whose network blocks are used
as learngene instances to construct the learngene pool. Since only one large
model is used, we do not need to store more large models as SN-Net and after
distilling, smaller learngene instances can be created to build small models to
satisfy low resource constraints. We also insert learnable transformation
matrices between the instances to stitch them into variable-sized models to
improve the performance of these models. Exhaustive experiments have been
implemented and the results validate the effectiveness of the proposed
Learngene Pool compared with SN-Net.
Related papers
- FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models [35.40065954148091]
FINE is a method based on the Learngene framework to initializing downstream networks leveraging pre-trained models.
It decomposes pre-trained knowledge into the product of matrices (i.e., $U$, $Sigma$, and $V$), where $U$ and $V$ are shared across network blocks as learngenes''
It consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across variable model sizes.
arXiv Detail & Related papers (2024-09-28T08:57:17Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Efficient Stitchable Task Adaptation [47.94819192325723]
We present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models.
Specifically, we first tailor parameter-efficient fine-tuning to share low-rank updates among the stitches.
We streamline a simple yet effective one-stage deployment pipeline, which estimates the important stitches to deploy.
arXiv Detail & Related papers (2023-11-29T04:31:35Z) - SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks [30.069353400127046]
We propose SortedNet to harness the inherent modularity of deep neural networks (DNNs)
SortedNet enables the training of sub-models simultaneously along with the training of the main model.
It is able to train 160 sub-models at once, achieving at least 96% of the original model's performance.
arXiv Detail & Related papers (2023-09-01T05:12:25Z) - One-Shot Pruning for Fast-adapting Pre-trained Models on Devices [28.696989086706186]
Large-scale pre-trained models have been remarkably successful in resolving downstream tasks.
deploying these models on low-capability devices still requires an effective approach, such as model pruning.
We present a scalable one-shot pruning method that leverages pruned knowledge of similar tasks to extract a sub-network from the pre-trained model for a new task.
arXiv Detail & Related papers (2023-07-10T06:44:47Z) - Stitchable Neural Networks [40.8842135978138]
We present Stitchable Neural Networks (SN-Net), a novel scalable and efficient framework for model deployment.
SN-Net splits the anchors across the blocks/layers and then stitches them together with simple stitching layers to map the activations from one anchor to another.
Experiments on ImageNet classification demonstrate that SN-Net can obtain on-par or even better performance than many individually trained networks.
arXiv Detail & Related papers (2023-02-13T18:37:37Z) - Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for
Downstream Tasks [55.431048995662714]
We create a small model for a new task from the pruned models of similar tasks.
We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task.
We develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task.
arXiv Detail & Related papers (2023-01-27T06:49:47Z) - The Self-Simplifying Machine: Exploiting the Structure of Piecewise
Linear Neural Networks to Create Interpretable Models [0.0]
We introduce novel methodology toward simplification and increased interpretability of Piecewise Linear Neural Networks for classification tasks.
Our methods include the use of a trained, deep network to produce a well-performing, single-hidden-layer network without further training.
On these methods, we conduct preliminary studies of model performance, as well as a case study on Wells Fargo's Home Lending dataset.
arXiv Detail & Related papers (2020-12-02T16:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.