Efficient Multi-Source Knowledge Transfer by Model Merging
- URL: http://arxiv.org/abs/2508.19353v1
- Date: Tue, 26 Aug 2025 18:31:38 GMT
- Title: Efficient Multi-Source Knowledge Transfer by Model Merging
- Authors: Marcin Osial, Bartosz Wójcik, Bartosz Zieliński, Sebastian Cygert,
- Abstract summary: Multi-source transfer learning is a promising path to boost adaptability and cut re-training costs.<n>Existing approaches are inherently coarse-grained, lacking the necessary precision for granular knowledge extraction.<n>We address these limitations by leveraging Singular Value Decomposition (SVD) to first decompose each source model into its elementary, rank-one components.<n>A subsequent aggregation stage then selects only the most salient components from all sources, thereby overcoming the previous efficiency and precision limitations.
- Score: 6.472612871493117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While transfer learning is an advantageous strategy, it overlooks the opportunity to leverage knowledge from numerous available models online. Addressing this multi-source transfer learning problem is a promising path to boost adaptability and cut re-training costs. However, existing approaches are inherently coarse-grained, lacking the necessary precision for granular knowledge extraction and the aggregation efficiency required to fuse knowledge from either a large number of source models or those with high parameter counts. We address these limitations by leveraging Singular Value Decomposition (SVD) to first decompose each source model into its elementary, rank-one components. A subsequent aggregation stage then selects only the most salient components from all sources, thereby overcoming the previous efficiency and precision limitations. To best preserve and leverage the synthesized knowledge base, our method adapts to the target task by fine-tuning only the principal singular values of the merged matrix. In essence, this process only recalibrates the importance of top SVD components. The proposed framework allows for efficient transfer learning, is robust to perturbations both at the input level and in the parameter space (e.g., noisy or pruned sources), and scales well computationally.
Related papers
- Unlocking Prototype Potential: An Efficient Tuning Framework for Few-Shot Class-Incremental Learning [69.28860905525057]
Few-shot class-incremental learning (FSCIL) seeks to continuously learn new classes from very limited samples.<n>We introduce an efficient prototype fine-tuning framework that evolves static centroids into dynamic, learnable components.
arXiv Detail & Related papers (2026-02-05T03:50:53Z) - Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation [45.72492804683268]
Large language models (LLMs) have shown remarkable promise but remain challenging to continually improve through traditional finetuning.<n>We propose a framework that adaptively selects and aggregates knowledge from diverse LLMs to build a single, stronger model.
arXiv Detail & Related papers (2025-05-28T16:24:50Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain.<n>We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters.<n>Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z) - H-ensemble: An Information Theoretic Approach to Reliable Few-Shot
Multi-Source-Free Transfer [4.328706834250445]
We propose a framework named H-ensemble, which learns the optimal linear combination of source models for the target task.
Compared to previous works, H-ensemble is characterized by: 1) its adaptability to a novel MSF setting for few-shot target tasks, 2) theoretical reliability, 3) a lightweight structure easy to interpret and adapt.
We show that the H-ensemble can successfully learn the optimal task ensemble, as well as outperform prior arts.
arXiv Detail & Related papers (2023-12-19T17:39:34Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Distilling from Similar Tasks for Transfer Learning on a Budget [38.998980344852846]
Transfer learning is an effective solution for training with few labels, however often at the expense of a computationally costly fine-tuning of large base models.
We propose to mitigate this unpleasant trade-off between compute and accuracy via semi-supervised cross-domain distillation.
Our methods need no access to source data, and merely need features and pseudo-labels of the source models.
arXiv Detail & Related papers (2023-04-24T17:59:01Z) - Partial Network Cloning [58.83278629019384]
PNC conducts partial parametric "cloning" from a source network and then injects the cloned module to the target.
Our method yields a significant improvement of 5% in accuracy and 50% in locality when compared with parameter-tuning based methods.
arXiv Detail & Related papers (2023-03-19T08:20:31Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.