Stitchable Neural Networks
- URL: http://arxiv.org/abs/2302.06586v3
- Date: Tue, 28 Mar 2023 11:09:51 GMT
- Title: Stitchable Neural Networks
- Authors: Zizheng Pan, Jianfei Cai, Bohan Zhuang
- Abstract summary: We present Stitchable Neural Networks (SN-Net), a novel scalable and efficient framework for model deployment.
SN-Net splits the anchors across the blocks/layers and then stitches them together with simple stitching layers to map the activations from one anchor to another.
Experiments on ImageNet classification demonstrate that SN-Net can obtain on-par or even better performance than many individually trained networks.
- Score: 40.8842135978138
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The public model zoo containing enormous powerful pretrained model families
(e.g., ResNet/DeiT) has reached an unprecedented scope than ever, which
significantly contributes to the success of deep learning. As each model family
consists of pretrained models with diverse scales (e.g., DeiT-Ti/S/B), it
naturally arises a fundamental question of how to efficiently assemble these
readily available models in a family for dynamic accuracy-efficiency trade-offs
at runtime. To this end, we present Stitchable Neural Networks (SN-Net), a
novel scalable and efficient framework for model deployment. It cheaply
produces numerous networks with different complexity and performance trade-offs
given a family of pretrained neural networks, which we call anchors.
Specifically, SN-Net splits the anchors across the blocks/layers and then
stitches them together with simple stitching layers to map the activations from
one anchor to another. With only a few epochs of training, SN-Net effectively
interpolates between the performance of anchors with varying scales. At
runtime, SN-Net can instantly adapt to dynamic resource constraints by
switching the stitching positions. Extensive experiments on ImageNet
classification demonstrate that SN-Net can obtain on-par or even better
performance than many individually trained networks while supporting diverse
deployment scenarios. For example, by stitching Swin Transformers, we challenge
hundreds of models in Timm model zoo with a single network. We believe this new
elastic model framework can serve as a strong baseline for further research in
wider communities.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Cross Spline Net and a Unified World [41.69175843713757]
Cross spline net (CSN) is based on a combination of spline transformation and cross-network.
CSN provides a unified modeling framework that puts the above set of non-neural network models under the same neural network framework.
We will show CSN is as performant and convenient to use, and is less complicated, more interpretable and robust.
arXiv Detail & Related papers (2024-10-24T20:45:48Z) - Building Variable-sized Models via Learngene Pool [39.99697115082106]
Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for building numerous networks with different complexity and performance trade-offs.
SN-Net faces challenges to build smaller models for low resource constraints.
We propose a novel method called Learngene Pool to overcome these challenges.
arXiv Detail & Related papers (2023-12-10T03:46:01Z) - Subnetwork-to-go: Elastic Neural Network with Dynamic Training and
Customizable Inference [16.564868336748503]
We propose a simple way to train a large network and flexibly extract a subnetwork from it given a model size or complexity constraint.
Experiment results on a music source separation model show that our proposed method can effectively improve the separation performance across different subnetwork sizes and complexities.
arXiv Detail & Related papers (2023-12-06T12:40:06Z) - Efficient Stitchable Task Adaptation [47.94819192325723]
We present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models.
Specifically, we first tailor parameter-efficient fine-tuning to share low-rank updates among the stitches.
We streamline a simple yet effective one-stage deployment pipeline, which estimates the important stitches to deploy.
arXiv Detail & Related papers (2023-11-29T04:31:35Z) - SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks [30.069353400127046]
We propose SortedNet to harness the inherent modularity of deep neural networks (DNNs)
SortedNet enables the training of sub-models simultaneously along with the training of the main model.
It is able to train 160 sub-models at once, achieving at least 96% of the original model's performance.
arXiv Detail & Related papers (2023-09-01T05:12:25Z) - Stitched ViTs are Flexible Vision Backbones [51.441023711924835]
We are inspired by stitchable neural networks (SN-Net) to produce a single model that covers richworks by stitching pretrained model families.
We introduce SN-Netv2, a systematically improved model stitching framework to facilitate downstream task adaptation.
SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone.
arXiv Detail & Related papers (2023-06-30T22:05:34Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance.
We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.