Extrapolation Merging: Keep Improving With Extrapolation and Merging
- URL: http://arxiv.org/abs/2503.04834v1
- Date: Wed, 05 Mar 2025 14:28:22 GMT
- Title: Extrapolation Merging: Keep Improving With Extrapolation and Merging
- Authors: Yiguan Lin, Bin Xu, Yinghao Li, Yang Gao,
- Abstract summary: Large Language Models (LLMs) require instruction fine-tuning to perform different downstream tasks.<n>Model merging aims to enhance performance by combining the parameters of different models.<n>We propose Extrapolation Merging, a paradigm that can continue improving model performance without requiring extra computational resources or data.
- Score: 14.786100203787194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) require instruction fine-tuning to perform different downstream tasks. However, the instruction fine-tuning phase still demands significant computational resources and labeled data, lacking a paradigm that can improve model performance without additional computational power and data. Model merging aims to enhance performance by combining the parameters of different models, but the lack of a clear optimization direction during the merging process does not always guarantee improved performance. In this paper, we attempt to provide a clear optimization direction for model merging. We first validate the effectiveness of the model extrapolation method during the instruction fine-tuning phase. Then, we propose Extrapolation Merging, a paradigm that can continue improving model performance without requiring extra computational resources or data. Using the extrapolation method, we provide a clear direction for model merging, achieving local optimization search, and consequently enhancing the merged model's performance. We conduct experiments on seven different tasks, and the results show that our method can consistently improve the model's performance after fine-tuning.
Related papers
- Dynamic Fisher-weighted Model Merging via Bayesian Optimization [37.02810891820468]
Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise.
We unify these strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge)
We show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks.
arXiv Detail & Related papers (2025-04-26T18:31:14Z) - ToolACE-R: Tool Learning with Adaptive Self-Refinement [84.69651852838794]
Tool learning allows Large Language Models to leverage external tools for solving complex user tasks.
We propose ToolACE-R, a novel method that introduces adaptive self-refinement for tool invocations.
Our results demonstrate the effectiveness of the proposed method, which is compatible with base models of various sizes.
arXiv Detail & Related papers (2025-04-02T06:38:56Z) - Entropy-Based Adaptive Weighting for Self-Training [15.089334734753677]
We propose Entropy-Based Adaptive Weighting for Self-Training (EAST)
EAST is an adaptive weighting strategy designed to prioritize uncertain data during self-training.
We evaluate our approach on GSM8K and MATH benchmarks.
arXiv Detail & Related papers (2025-03-31T10:04:35Z) - Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation [17.39117429338763]
We propose CoPA-Merging, a training-free parameter efficient merging method with complementary parameter adaptation.<n>We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certificate the outstanding performance and generalizability of our method.
arXiv Detail & Related papers (2025-02-24T13:52:05Z) - Scalable Model Merging with Progressive Layer-wise Distillation [17.521794641817642]
We introduce a novel few-shot merging algorithm, ProDistill (Progressive Layer-wise Distillation)<n>We show that ProDistill achieves state-of-the-art performance, with up to 6.14% and 6.61% improvements in vision and NLU tasks.
arXiv Detail & Related papers (2025-02-18T10:15:18Z) - Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - Non-Uniform Parameter-Wise Model Merging [17.989809995141044]
We introduce a novel approach, Non-uniform.<n>wise Model Merging, or NP Merge, which merges models by learning the contribution of each.<n> parameter to the final model using gradient-based optimization.<n>We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods.
arXiv Detail & Related papers (2024-12-20T00:05:14Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - How to Fine-tune the Model: Unified Model Shift and Model Bias Policy
Optimization [13.440645736306267]
This paper develops an algorithm for model-based reinforcement learning.
It unifies model shift and model bias and then formulates a fine-tuning process.
It achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-09-22T07:27:32Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.