Related papers: Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging

Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging

URL: http://arxiv.org/abs/2512.01461v1
Date: Mon, 01 Dec 2025 09:47:17 GMT
Title: Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging
Authors: Kuangpu Guo, Yuhe Ding, Jian Liang, Zilei Wang, Ran He,
Abstract summary: Decomposition, Thresholding, and Scaling (DTS) is an approximation-based personalized merging framework.<n>DTS preserves task-specific information with minimal storage overhead.<n>We extend DTS with a variant that fuses task-specific information in a data-free manner based on the semantic similarity of task characteristics.
Score: 62.61159948488935
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model merging has emerged as a promising paradigm for enabling multi-task capabilities without additional training. However, existing methods often experience substantial performance degradation compared with individually fine-tuned models, even on similar tasks, underscoring the need to preserve task-specific information. This paper proposes Decomposition, Thresholding, and Scaling (DTS), an approximation-based personalized merging framework that preserves task-specific information with minimal storage overhead. DTS first applies singular value decomposition to the task-specific information and retains only a small subset of singular values and vectors. It then introduces a novel thresholding strategy that partitions singular vector elements into groups and assigns a scaling factor to each group. To enable generalization to unseen tasks, we further extend DTS with a variant that fuses task-specific information in a data-free manner based on the semantic similarity of task characteristics. Extensive experiments demonstrate that DTS consistently outperforms state-of-the-art baselines while requiring only 1\% additional storage per task. Furthermore, experiments on unseen tasks show that the DTS variant achieves significantly better generalization performance. Our code is available at https://github.com/krumpguo/DTS.

Related papers

TADS: Task-Aware Data Selection for Multi-Task Multimodal Pre-Training [29.962039479618543]
We introduce TADS (Task-Aware Data Selection), a novel framework for multi-task multimodal pre-training.<n> TADS integrates Intrinsic Quality, Task Relevance, and Distributional Diversity into a learnable value function.<n>A feedback-driven meta-learning mechanism adaptively refines the selection strategy based on proxy model performance.
arXiv Detail & Related papers (2026-02-05T03:08:45Z)
StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets [14.867396697566257]
We extend the partial learning setup to a zero-shot setting, training a multi-task model on multiple datasets, each labeled for only a subset of tasks.<n>Our method, StableMTL, repurposes image generators for latent regression.<n>Instead of per-task losses requiring careful balancing, a unified latent loss is adopted, enabling seamless scaling to more tasks.
arXiv Detail & Related papers (2025-06-09T17:59:59Z)
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance.<n>LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z)
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for lifelong instruction tuning.<n>We construct pseudo-skill clusters by grouping gradient-based sample vectors.<n>We select the best-performing data selector for each skill cluster from a pool of selector experts.<n>This data selector samples a subset of the most important samples from each skill cluster for training.
arXiv Detail & Related papers (2024-10-14T15:48:09Z)
UniTS: A Unified Multi-Task Time Series Model [31.675845788410246]
UniTS is a unified multi-task time series model that integrates predictive and generative tasks into a single framework. UniTS is tested on 38 datasets across human activity sensors, healthcare, engineering, and finance.
arXiv Detail & Related papers (2024-02-29T21:25:58Z)
Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes) TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task. The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z)
Disentangling Task Relations for Few-shot Text Classification via Self-Supervised Hierarchical Task Clustering [12.37413812344515]
Few-Shot Text Classification imitates humans to learn a new text classifier efficiently with only few examples. Most prior works assume that all the tasks are sampled from a single data source, which cannot adapt to real-world scenarios where tasks are heterogeneous and lie in different distributions. We propose a self-supervised hierarchical task clustering (SS-HTC) method to address the task heterogeneity.
arXiv Detail & Related papers (2022-11-16T00:19:53Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
Low Resource Multi-Task Sequence Tagging -- Revisiting Dynamic Conditional Random Fields [67.51177964010967]
We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks. We find that explicit modeling of inter-dependencies between task predictions outperforms single-task as well as standard multi-task models.
arXiv Detail & Related papers (2020-05-01T07:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.