Related papers: DPI: Exploiting Parameter Heterogeneity for Interference-Free Fine-Tuning

DPI: Exploiting Parameter Heterogeneity for Interference-Free Fine-Tuning

URL: http://arxiv.org/abs/2601.17777v1
Date: Sun, 25 Jan 2026 10:30:45 GMT
Title: DPI: Exploiting Parameter Heterogeneity for Interference-Free Fine-Tuning
Authors: Xiaoyu Liu, Xiaoyu Guan, Di Liang, Xianjie Wu,
Abstract summary: Supervised fine-tuning (SFT) is a crucial step for adapting large language models (LLMs) to downstream tasks.<n>We propose a principled approach to disentangle and isolate task-specific parameter regions.
Score: 11.751530422766836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Supervised fine-tuning (SFT) is a crucial step for adapting large language models (LLMs) to downstream tasks. However, conflicting objectives across heterogeneous SFT tasks often induce the "seesaw effect": optimizing for one task may degrade performance on others, particularly when model parameters are updated indiscriminately. In this paper, we propose a principled approach to disentangle and isolate task-specific parameter regions, motivated by the hypothesis that parameter heterogeneity underlies cross-task interference. Specifically, we first independently fine-tune LLMs on diverse SFT tasks and identify each task's core parameter region as the subset of parameters exhibiting the largest updates. Tasks with highly overlapping core parameter regions are merged for joint training, while disjoint tasks are organized into different stages. During multi-stage SFT, core parameters acquired in prior tasks are frozen, thereby preventing overwriting by subsequent tasks. To verify the effectiveness of our method, we conducted intensive experiments on multiple public datasets. The results showed that our dynamic parameter isolation strategy consistently reduced data conflicts and achieved consistent performance improvements compared to multi-stage and multi-task tuning baselines.

Related papers

Model Merging in the Essential Subspace [78.5390284258307]
Model merging aims to integrate multiple task-specific fine-tuned models into a single multi-task model without additional training.<n>Despite extensive research, task interference remains a major obstacle that often undermines the performance of merged models.<n>We propose ESM (Essential Subspace Merging), a robust framework for effective model merging.
arXiv Detail & Related papers (2026-02-23T00:33:38Z)
Parameter Aware Mamba Model for Multi-task Dense Prediction [69.94454603308196]
We introduce a novel decoder-based framework, Aware Mamba Model (PAMM), specifically designed for dense prediction in multi-task learning setting.<n>It features dual state space parameter experts that integrate and set task-specific parameter priors, capturing the intrinsic properties of each task.<n>We employ the Multi-Directional Hilbert Scanning method to construct multi-angle feature sequences, thereby enhancing the sequence model's perceptual capabilities for 2D data.
arXiv Detail & Related papers (2025-11-18T13:48:00Z)
Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance [13.636389424786854]
Core parameters from each task are transplanted into a unified backbone.<n>Non-core parameters from different tasks are smoothly integrated via Spherical Linear Interpolation.<n>Experiments on multiple public benchmarks demonstrate that our approach significantly alleviates task interference and forgetting.
arXiv Detail & Related papers (2025-08-29T16:07:33Z)
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness [28.437105789298244]
RobustMerge is a training-free parameter-efficient merging method with complementary parameter adaptation to maintain direction robustness.<n>We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certify the outstanding performance and generalizability of our method.
arXiv Detail & Related papers (2025-02-24T13:52:05Z)
Beyond Task Vectors: Selective Task Arithmetic Based on Importance Metrics [0.0]
This paper introduces textbfunderlineSelective textbfunderlineTask textbfunderlineArithmetic underlinetextbf(STA), a training-free framework designed to enhance multi-task performance through task-specific parameter fusion. Experimental results demonstrate that STA achieves superior multi-task performance across benchmarks and excellent performance in task forgetting.
arXiv Detail & Related papers (2024-11-25T06:59:16Z)
Task-Aware Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [70.96345405979179]
The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. variations in task content and complexity pose significant challenges in policy formulation. We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
arXiv Detail & Related papers (2024-11-02T05:49:14Z)
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences [49.14535254003683]
We introduce PaLoRA, a novel parameter-efficient method that addresses multi-task trade-offs in machine learning.<n>Our experiments show that PaLoRA outperforms state-of-the-art MTL and PFL baselines across various datasets.
arXiv Detail & Related papers (2024-07-10T21:25:51Z)
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z)
Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning [20.177260510548535]
We propose the Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty. Our method is scalable and significantly reduces the model's redundancy while improving the model's performance.
arXiv Detail & Related papers (2023-04-11T15:38:21Z)
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution. We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z)
Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient Multi-task Knowledge Sharing [53.399742232323895]
ATTEMPT is a new modular, multi-task, and parameter-efficient language model (LM) tuning approach. It combines knowledge transferred across different tasks via a mixture of soft prompts while keeping original LM unchanged. It is parameter-efficient (e.g., updates 1,600 times fewer parameters than fine-tuning) and enables multi-task learning and flexible extensions.
arXiv Detail & Related papers (2022-05-24T10:48:33Z)
Maximum Roaming Multi-Task Learning [18.69970611732082]
We present a novel way to partition the parameter space without weakening the inductive bias. Specifically, we propose Maximum Roaming, a method inspired by dropout that randomly varies the parameter partitioning. Experimental results suggest that the regularization brought by roaming has more impact on performance than usual partitioning optimization strategies.
arXiv Detail & Related papers (2020-06-17T10:25:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.