Related papers: Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

URL: http://arxiv.org/abs/2407.05385v1
Date: Sun, 7 Jul 2024 14:21:04 GMT
Title: Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Authors: Stefan Horoi, Albert Manuel Orozco Camacho, Eugene Belilovsky, Guy Wolf,
Abstract summary: We propose CCA Merge, which is based on Corre Analysis Analysis. We show that CCA works significantly better than past methods when more than 2 models are merged.
Score: 17.989809995141044
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Combining the predictions of multiple trained models through ensembling is generally a good way to improve accuracy by leveraging the different learned features of the models, however it comes with high computational and storage costs. Model fusion, the act of merging multiple models into one by combining their parameters reduces these costs but doesn't work as well in practice. Indeed, neural network loss landscapes are high-dimensional and non-convex and the minima found through learning are typically separated by high loss barriers. Numerous recent works have been focused on finding permutations matching one network features to the features of a second one, lowering the loss barrier on the linear path between them in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our alignment method leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder setting where more than 2 models are merged, and we find that CCA Merge works significantly better than past methods. Our code is publicly available at https://github.com/shoroi/align-n-merge

Related papers

Multi-Level Collaboration in Model Merging [56.31088116526825]
This paper explores the intrinsic connections between model merging and model ensembling. We find that even when previous restrictions are not met, there is still a way for model merging to attain a near-identical and superior performance similar to that of ensembling.
arXiv Detail & Related papers (2025-03-03T07:45:04Z)
Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints. We empirically find that this training paradigm limits the one-step generation performance of consistency models. We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z)
Training-Free Model Merging for Multi-target Domain Adaptation [6.00960357022946]
We study multi-target domain adaptation of scene understanding models. Our solution involves two components, merging model parameters and merging model buffers. Our method is simple yet effective, achieving comparable performance with data combination training baselines.
arXiv Detail & Related papers (2024-07-18T17:59:57Z)
PLeaS -- Merging Models with Permutations and Least Squares [43.17620198572947]
We propose a new two-step algorithm to merge models-termed PLeaS. PLeaS partially matches nodes in each layer by maximizing alignment. It computes the weights of the merged model as a layer-wise Least Squares solution.
arXiv Detail & Related papers (2024-07-02T17:24:04Z)
Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models. It is most prominently used in federated learning. We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z)
TIES-Merging: Resolving Interference When Merging Models [95.59265307318752]
Transfer learning can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. Model merging has emerged as a solution to combine multiple task-specific models into a single model without performing additional training. Existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. We propose TIES-Merging, which introduces three novel steps when merging models: resetting parameters that only changed a small amount during fine-tuning, resolving sign conflicts, and merging only the parameters that are in alignment with the final agreed-upon sign.
arXiv Detail & Related papers (2023-06-02T17:31:32Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
On the Versatile Uses of Partial Distance Correlation in Deep Learning [47.11577420740119]
This paper revisits a (less widely known) from statistics, called distance correlation (and its partial variant), designed to evaluate correlation between feature spaces of different dimensions. We describe the steps necessary to carry out its deployment for large scale models. This opens the door to a surprising array of applications ranging from conditioning one deep model w.r.t. another, learning disentangled representations as well as optimizing diverse models that would directly be more robust to adversarial attacks.
arXiv Detail & Related papers (2022-07-20T06:36:11Z)
Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them. We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.