Related papers: Subnetwork-to-go: Elastic Neural Network with Dynamic Training and Customizable Inference

Subnetwork-to-go: Elastic Neural Network with Dynamic Training and Customizable Inference

URL: http://arxiv.org/abs/2312.03464v1
Date: Wed, 6 Dec 2023 12:40:06 GMT
Title: Subnetwork-to-go: Elastic Neural Network with Dynamic Training and Customizable Inference
Authors: Kai Li, Yi Luo
Abstract summary: We propose a simple way to train a large network and flexibly extract a subnetwork from it given a model size or complexity constraint. Experiment results on a music source separation model show that our proposed method can effectively improve the separation performance across different subnetwork sizes and complexities.
Score: 16.564868336748503
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deploying neural networks to different devices or platforms is in general challenging, especially when the model size is large or model complexity is high. Although there exist ways for model pruning or distillation, it is typically required to perform a full round of model training or finetuning procedure in order to obtain a smaller model that satisfies the model size or complexity constraints. Motivated by recent works on dynamic neural networks, we propose a simple way to train a large network and flexibly extract a subnetwork from it given a model size or complexity constraint during inference. We introduce a new way to allow a large model to be trained with dynamic depth and width during the training phase, and after the large model is trained we can select a subnetwork from it with arbitrary depth and width during the inference phase with a relatively better performance compared to training the subnetwork independently from scratch. Experiment results on a music source separation model show that our proposed method can effectively improve the separation performance across different subnetwork sizes and complexities with a single large model, and training the large model takes significantly shorter time than training all the different subnetworks.

Related papers

Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [56.9358325168226]
We propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND) Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model. Our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model.
arXiv Detail & Related papers (2024-03-23T08:40:38Z)
SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks [30.069353400127046]
We propose SortedNet to harness the inherent modularity of deep neural networks (DNNs) SortedNet enables the training of sub-models simultaneously along with the training of the main model. It is able to train 160 sub-models at once, achieving at least 96% of the original model's performance.
arXiv Detail & Related papers (2023-09-01T05:12:25Z)
Dynamic Mixed Membership Stochastic Block Model for Weighted Labeled Networks [3.5450828190071655]
A new family of Mixed Membership Block Models (MMSBM) allows to model static labeled networks under the assumption of mixed-membership clustering. We show that our method significantly differs from existing approaches, and allows to model more complex systems --dynamic labeled networks.
arXiv Detail & Related papers (2023-04-12T15:01:03Z)
Stitchable Neural Networks [40.8842135978138]
We present Stitchable Neural Networks (SN-Net), a novel scalable and efficient framework for model deployment. SN-Net splits the anchors across the blocks/layers and then stitches them together with simple stitching layers to map the activations from one anchor to another. Experiments on ImageNet classification demonstrate that SN-Net can obtain on-par or even better performance than many individually trained networks.
arXiv Detail & Related papers (2023-02-13T18:37:37Z)
On Optimizing the Communication of Model Parallelism [74.15423270435949]
We study a novel and important communication pattern in large-scale model-parallel deep learning (DL) In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to a destination device mesh. We propose two contributions to address cross-mesh resharding: an efficient broadcast-based communication system, and an "overlapping-friendly" pipeline schedule.
arXiv Detail & Related papers (2022-11-10T03:56:48Z)
Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance. We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z)
Learnable Expansion-and-Compression Network for Few-shot Class-Incremental Learning [87.94561000910707]
We propose a learnable expansion-and-compression network (LEC-Net) to solve catastrophic forgetting and model over-fitting problems. LEC-Net enlarges the representation capacity of features, alleviating feature drift of old network from the perspective of model regularization. Experiments on the CUB/CIFAR-100 datasets show that LEC-Net improves the baseline by 57% while outperforms the state-of-the-art by 56%.
arXiv Detail & Related papers (2021-04-06T04:34:21Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.