StitchNet: Composing Neural Networks from Pre-Trained Fragments
- URL: http://arxiv.org/abs/2301.01947v3
- Date: Sat, 23 Sep 2023 05:25:34 GMT
- Title: StitchNet: Composing Neural Networks from Pre-Trained Fragments
- Authors: Surat Teerapittayanon, Marcus Comiter, Brad McDanel, H.T. Kung
- Abstract summary: We propose StitchNet, a novel neural network creation paradigm.
It stitches together fragments from multiple pre-trained neural networks.
We show that these fragments can be stitched together to create neural networks with accuracy comparable to that of traditionally trained networks.
- Score: 3.638431342539701
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose StitchNet, a novel neural network creation paradigm that stitches
together fragments (one or more consecutive network layers) from multiple
pre-trained neural networks. StitchNet allows the creation of high-performing
neural networks without the large compute and data requirements needed under
traditional model creation processes via backpropagation training. We leverage
Centered Kernel Alignment (CKA) as a compatibility measure to efficiently guide
the selection of these fragments in composing a network for a given task
tailored to specific accuracy needs and computing resource constraints. We then
show that these fragments can be stitched together to create neural networks
with accuracy comparable to that of traditionally trained networks at a
fraction of computing resource and data requirements. Finally, we explore a
novel on-the-fly personalized model creation and inference application enabled
by this new paradigm. The code is available at
https://github.com/steerapi/stitchnet.
Related papers
- NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance [0.0]
We propose a zero-cost proxy Network Expressivity by Activation Rank (NEAR) to identify the optimal neural network without training.
We demonstrate the cutting-edge correlation between this network score and the model accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS.
arXiv Detail & Related papers (2024-08-16T14:38:14Z) - Stitching for Neuroevolution: Recombining Deep Neural Networks without Breaking Them [0.0]
Traditional approaches to neuroevolution often start from scratch.
Recombining trained networks is non-trivial because architectures and feature representations typically differ.
We employ stitching, which merges the networks by introducing new layers at crossover points.
arXiv Detail & Related papers (2024-03-21T08:30:44Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Increasing Depth of Neural Networks for Life-long Learning [2.0305676256390934]
We propose a novel method for continual learning based on the increasing depth of neural networks.
This work explores whether extending neural network depth may be beneficial in a life-long learning setting.
arXiv Detail & Related papers (2022-02-22T11:21:41Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Provably Training Neural Network Classifiers under Fairness Constraints [70.64045590577318]
We show that overparametrized neural networks could meet the constraints.
Key ingredient of building a fair neural network classifier is establishing no-regret analysis for neural networks.
arXiv Detail & Related papers (2020-12-30T18:46:50Z) - Graph-Based Neural Network Models with Multiple Self-Supervised
Auxiliary Tasks [79.28094304325116]
Graph Convolutional Networks are among the most promising approaches for capturing relationships among structured data points.
We propose three novel self-supervised auxiliary tasks to train graph-based neural network models in a multi-task fashion.
arXiv Detail & Related papers (2020-11-14T11:09:51Z) - Finding trainable sparse networks through Neural Tangent Transfer [16.092248433189816]
In deep learning, trainable sparse networks that perform well on a specific task are usually constructed using label-dependent pruning criteria.
In this article, we introduce Neural Tangent Transfer, a method that instead finds trainable sparse networks in a label-free manner.
arXiv Detail & Related papers (2020-06-15T08:58:01Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.