Model Fusion of Heterogeneous Neural Networks via Cross-Layer Alignment
- URL: http://arxiv.org/abs/2110.15538v1
- Date: Fri, 29 Oct 2021 05:02:23 GMT
- Title: Model Fusion of Heterogeneous Neural Networks via Cross-Layer Alignment
- Authors: Dang Nguyen and Khai Nguyen and Dinh Phung and Hung Bui and Nhat Ho
- Abstract summary: We propose a novel model fusion framework, named CLAFusion, to fuse neural networks with a different number of layers.
Based on the cross-layer alignment, our framework balances the number of layers of neural networks before applying layer-wise model fusion.
- Score: 17.735593218773758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Layer-wise model fusion via optimal transport, named OTFusion, applies soft
neuron association for unifying different pre-trained networks to save
computational resources. While enjoying its success, OTFusion requires the
input networks to have the same number of layers. To address this issue, we
propose a novel model fusion framework, named CLAFusion, to fuse neural
networks with a different number of layers, which we refer to as heterogeneous
neural networks, via cross-layer alignment. The cross-layer alignment problem,
which is an unbalanced assignment problem, can be solved efficiently using
dynamic programming. Based on the cross-layer alignment, our framework balances
the number of layers of neural networks before applying layer-wise model
fusion. Our synthetic experiments indicate that the fused network from
CLAFusion achieves a more favorable performance compared to the individual
networks trained on heterogeneous data without the need for any retraining.
With an extra fine-tuning process, it improves the accuracy of residual
networks on the CIFAR10 dataset. Finally, we explore its application for model
compression and knowledge distillation when applying to the teacher-student
setting.
Related papers
- Stitching for Neuroevolution: Recombining Deep Neural Networks without Breaking Them [0.0]
Traditional approaches to neuroevolution often start from scratch.
Recombining trained networks is non-trivial because architectures and feature representations typically differ.
We employ stitching, which merges the networks by introducing new layers at crossover points.
arXiv Detail & Related papers (2024-03-21T08:30:44Z) - Towards Optimal Customized Architecture for Heterogeneous Federated
Learning with Contrastive Cloud-Edge Model Decoupling [20.593232086762665]
Federated learning, as a promising distributed learning paradigm, enables collaborative training of a global model across multiple network edge clients without the need for central data collecting.
We propose a novel federated learning framework called FedCMD, a model decoupling tailored to the Cloud-edge supported federated learning.
Our motivation is that, by the deep investigation of the performance of selecting different neural network layers as the personalized head, we found rigidly assigning the last layer as the personalized head in current studies is not always optimal.
arXiv Detail & Related papers (2024-03-04T05:10:28Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Lattice Fusion Networks for Image Denoising [4.010371060637209]
A novel method for feature fusion in convolutional neural networks is proposed in this paper.
Some of these techniques as well as the proposed network can be considered a type of Directed Acyclic Graph (DAG) Network.
The proposed network is able to achieve better results with far fewer learnable parameters.
arXiv Detail & Related papers (2020-11-28T18:57:54Z) - The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network
Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks.
We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance.
Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.