Model Fusion via Optimal Transport
- URL: http://arxiv.org/abs/1910.05653v6
- Date: Tue, 16 May 2023 17:57:37 GMT
- Title: Model Fusion via Optimal Transport
- Authors: Sidak Pal Singh and Martin Jaggi
- Abstract summary: We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
- Score: 64.13185244219353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Combining different models is a widely used paradigm in machine learning
applications. While the most common approach is to form an ensemble of models
and average their individual predictions, this approach is often rendered
infeasible by given resource constraints in terms of memory and computation,
which grow linearly with the number of models. We present a layer-wise model
fusion algorithm for neural networks that utilizes optimal transport to (soft-)
align neurons across the models before averaging their associated parameters.
We show that this can successfully yield "one-shot" knowledge transfer (i.e,
without requiring any retraining) between neural networks trained on
heterogeneous non-i.i.d. data. In both i.i.d. and non-i.i.d. settings , we
illustrate that our approach significantly outperforms vanilla averaging, as
well as how it can serve as an efficient replacement for the ensemble with
moderate fine-tuning, for standard convolutional networks (like VGG11),
residual networks (like ResNet18), and multi-layer perceptrons on CIFAR10,
CIFAR100, and MNIST. Finally, our approach also provides a principled way to
combine the parameters of neural networks with different widths, and we explore
its application for model compression. The code is available at the following
link, https://github.com/sidak/otfusion.
Related papers
- BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [56.9358325168226]
We propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND)
Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model.
Our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model.
arXiv Detail & Related papers (2024-03-23T08:40:38Z) - Explicit Foundation Model Optimization with Self-Attentive Feed-Forward
Neural Units [4.807347156077897]
Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive when used at scale.
This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications.
arXiv Detail & Related papers (2023-11-13T17:55:07Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Representing Random Utility Choice Models with Neural Networks [0.0]
Motivated by successes of deep learning, we propose a class of neural network-based discrete choice models, called RUMnets.
RUMnets formulates the agents' random utility function using a sample average approximation.
We find that RUMnets are competitive against several choice modeling and machine learning methods in terms of accuracy on two real-world datasets.
arXiv Detail & Related papers (2022-07-26T13:12:22Z) - An alternative approach to train neural networks using monotone
variational inequality [22.320632565424745]
We propose an alternative approach to neural network training using the monotone vector field.
Our approach can be used for more efficient fine-tuning of a pre-trained neural network.
arXiv Detail & Related papers (2022-02-17T19:24:20Z) - Fully differentiable model discovery [0.0]
We propose an approach by combining neural network based surrogates with Sparse Bayesian Learning.
Our work expands PINNs to various types of neural network architectures, and connects neural network-based surrogates to the rich field of Bayesian parameter inference.
arXiv Detail & Related papers (2021-06-09T08:11:23Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.