Data-driven Clustering and Merging of Adapters for On-device Large Language Models
- URL: http://arxiv.org/abs/2601.17441v1
- Date: Sat, 24 Jan 2026 12:25:46 GMT
- Title: Data-driven Clustering and Merging of Adapters for On-device Large Language Models
- Authors: Ondrej Bohdal, Taha Ceritli, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli,
- Abstract summary: On-device large language models commonly employ task-specific adapters (e.g., LoRAs) to deliver strong performance on downstream tasks.<n>This raises a critical challenge: how to select representative adapters that generalize well across multiple tasks.<n>We propose a novel method D2C for adapter clustering that leverages minimal task-specific examples and employs an iterative optimization process to refine cluster assignments.
- Score: 34.58968471192321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-device large language models commonly employ task-specific adapters (e.g., LoRAs) to deliver strong performance on downstream tasks. While storing all available adapters is impractical due to memory constraints, mobile devices typically have sufficient capacity to store a limited number of these parameters. This raises a critical challenge: how to select representative adapters that generalize well across multiple tasks - a problem that remains unexplored in existing literature. We propose a novel method D2C for adapter clustering that leverages minimal task-specific examples (e.g., 10 per task) and employs an iterative optimization process to refine cluster assignments. The adapters within each cluster are merged, creating multi-task adapters deployable on resource-constrained devices. Experimental results demonstrate that our method effectively boosts performance for considered storage budgets.
Related papers
- Effective LoRA Adapter Routing using Task Representations [3.0111172730438565]
Low-rank adaptation (LoRA) enables parameter efficient specialization of large language models (LLMs) through modular adapters.<n>We introduce LORAUTER, a novel routing framework that selects and composes LoRA adapters using task representations rather than adapter characteristics.
arXiv Detail & Related papers (2026-01-29T14:41:24Z) - On-device System of Compositional Multi-tasking in Large Language Models [29.561801948704822]
We propose a novel approach tailored specifically for compositional multi-tasking scenarios involving summarization and translation.<n>Our technique involves adding a learnable projection layer on top of the combined summarization and translation adapters.<n>We demonstrate the practical viability of our method within an on-device environment by developing an Android app capable of executing compositional tasks seamlessly.
arXiv Detail & Related papers (2025-10-11T19:49:22Z) - Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning [33.57130798344366]
We propose integrating Task-Specific and Universal Adapters (TUNA) in this paper.<n> Specifically, we train task-specific adapters to capture the most crucial features relevant to their respective tasks.<n>We leverage an adapter fusion strategy to construct a universal adapter, which encodes the most discriminative features shared across tasks.
arXiv Detail & Related papers (2025-08-11T16:41:04Z) - MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair [5.006064616335817]
Large Language Models (LLMs) have shown high capabilities in several software development-related tasks.<n> adapters offer a more efficient way to customize LLMs for particular needs.<n>Model (and adapter) merging have emerged as a technique to develop one model capable of multiple tasks.
arXiv Detail & Related papers (2024-08-18T18:45:48Z) - Towards Modular LLMs by Building and Reusing a Library of LoRAs [64.43376695346538]
We study how to best build a library of adapters given multi-task data.
We introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters.
To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters.
arXiv Detail & Related papers (2024-05-18T03:02:23Z) - Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models [12.230087530720652]
We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario.
The adapter consists of a single shared controller network and multiple task-level adapter heads.
arXiv Detail & Related papers (2024-03-25T17:21:56Z) - Vision Transformer Adapters for Generalizable Multitask Learning [61.79647180647685]
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities.
Our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner.
In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added.
arXiv Detail & Related papers (2023-08-23T18:40:48Z) - Cross-Lingual Transfer with Target Language-Ready Task Adapters [66.5336029324059]
BAD-X, an extension of the MAD-X framework, achieves improved transfer at the cost of MAD-X's modularity.
We aim to take the best of both worlds by fine-tuning task adapters adapted to the target language.
arXiv Detail & Related papers (2023-06-05T10:46:33Z) - Multi-Head Adapter Routing for Cross-Task Generalization [56.75667096355806]
Polytropon learns an inventory of adapters and a routing function that selects a subset of adapters for each task during both pre-training and few-shot adaptation.
We find that routing is most beneficial during multi-task pre-training rather than during few-shot adaptation.
arXiv Detail & Related papers (2022-11-07T19:35:55Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.