Related papers: Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion

Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion

URL: http://arxiv.org/abs/2410.05746v1
Date: Tue, 8 Oct 2024 07:21:24 GMT
Title: Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion
Authors: Bowen Tian, Songning Lai, Yutao Yue,
Abstract summary: We introduce AutoFusion, a framework that fuses distinct model parameters for multi-task learning without pre-trained checkpoints. We validate AutoFusion's effectiveness through experiments on commonly used benchmark datasets. Our framework offers a scalable and flexible solution for model integration, positioning it as a powerful tool for future research and practical applications.
Score: 4.164728134421114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the rapidly evolving field of deep learning, specialized models have driven significant advancements in tasks such as computer vision and natural language processing. However, this specialization leads to a fragmented ecosystem where models lack the adaptability for broader applications. To overcome this, we introduce AutoFusion, an innovative framework that fuses distinct model parameters(with the same architecture) for multi-task learning without pre-trained checkpoints. Using an unsupervised, end-to-end approach, AutoFusion dynamically permutes model parameters at each layer, optimizing the combination through a loss-minimization process that does not require labeled data. We validate AutoFusion's effectiveness through experiments on commonly used benchmark datasets, demonstrating superior performance over established methods like Weight Interpolation, Git Re-Basin, and ZipIt. Our framework offers a scalable and flexible solution for model integration, positioning it as a powerful tool for future research and practical applications.

Related papers

Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer [17.463052541838504]
Fine-tuned models often struggle outside their specific domains and exhibit considerable redundancy.<n>Recent studies suggest that combining a pruned fine-tuned model with the original pre-trained model can mitigate interference when merging model parameters across tasks.<n>We introduce a novel method called Neural Pruning (NPS-Pruning) for slimming down fine-tuned models.
arXiv Detail & Related papers (2025-05-24T14:27:20Z)
Dynamic Fisher-weighted Model Merging via Bayesian Optimization [37.02810891820468]
Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise. We unify these strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge) We show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks.
arXiv Detail & Related papers (2025-04-26T18:31:14Z)
Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration [104.52015641099828]
Existing research excels in removing such parameters or merging the outputs of multiple different pretrained models. We propose Compatibility-aware Knowledge Integration (CKI), which consists of Deep Assessment and Deep Splicing. The integrated model can be used directly for inference or for further fine-tuning.
arXiv Detail & Related papers (2025-01-10T01:42:43Z)
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models [28.993221775758702]
Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. This paper marks a significant advance toward more flexible and comprehensive model merging techniques. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies.
arXiv Detail & Related papers (2024-09-27T16:31:31Z)
FusionBench: A Comprehensive Benchmark of Deep Model Fusion [78.80920533793595]
Deep model fusion is a technique that unifies the predictions or parameters of several deep neural networks into a single model. FusionBench is the first comprehensive benchmark dedicated to deep model fusion.
arXiv Detail & Related papers (2024-06-05T13:54:28Z)
FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis [0.7751705157998379]
The scarcity of well-annotated medical datasets requires leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages multiple fine-tuned models aiming to improve performance on In-Domain (ID) tasks and enhance robustness against Out-of-Distribution (OOD) datasets. We propose a hierarchical merging approach that involves local and global aggregation of models at various levels.
arXiv Detail & Related papers (2024-03-20T06:48:48Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Efficient Data-specific Model Search for Collaborative Filtering [56.60519991956558]
Collaborative filtering (CF) is a fundamental approach for recommender systems. In this paper, motivated by the recent advances in automated machine learning (AutoML), we propose to design a data-specific CF model. Key here is a new framework that unifies state-of-the-art (SOTA) CF methods and splits them into disjoint stages of input encoding, embedding function, interaction and prediction function.
arXiv Detail & Related papers (2021-06-14T14:30:32Z)
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data. We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration. We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)
Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.