Related papers: Data-driven Clustering and Merging of Adapters for On-device Large Language Models

Data-driven Clustering and Merging of Adapters for On-device Large Language Models

URL: http://arxiv.org/abs/2601.17441v1
Date: Sat, 24 Jan 2026 12:25:46 GMT
Title: Data-driven Clustering and Merging of Adapters for On-device Large Language Models
Authors: Ondrej Bohdal, Taha Ceritli, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli,
Abstract summary: On-device large language models commonly employ task-specific adapters (e.g., LoRAs) to deliver strong performance on downstream tasks.<n>This raises a critical challenge: how to select representative adapters that generalize well across multiple tasks.<n>We propose a novel method D2C for adapter clustering that leverages minimal task-specific examples and employs an iterative optimization process to refine cluster assignments.
Score: 34.58968471192321
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: On-device large language models commonly employ task-specific adapters (e.g., LoRAs) to deliver strong performance on downstream tasks. While storing all available adapters is impractical due to memory constraints, mobile devices typically have sufficient capacity to store a limited number of these parameters. This raises a critical challenge: how to select representative adapters that generalize well across multiple tasks - a problem that remains unexplored in existing literature. We propose a novel method D2C for adapter clustering that leverages minimal task-specific examples (e.g., 10 per task) and employs an iterative optimization process to refine cluster assignments. The adapters within each cluster are merged, creating multi-task adapters deployable on resource-constrained devices. Experimental results demonstrate that our method effectively boosts performance for considered storage budgets.

Related papers

Effective LoRA Adapter Routing using Task Representations [3.0111172730438565]
Low-rank adaptation (LoRA) enables parameter efficient specialization of large language models (LLMs) through modular adapters.<n>We introduce LORAUTER, a novel routing framework that selects and composes LoRA adapters using task representations rather than adapter characteristics.
arXiv Detail & Related papers (2026-01-29T14:41:24Z)
On-device System of Compositional Multi-tasking in Large Language Models [29.561801948704822]
We propose a novel approach tailored specifically for compositional multi-tasking scenarios involving summarization and translation.<n>Our technique involves adding a learnable projection layer on top of the combined summarization and translation adapters.<n>We demonstrate the practical viability of our method within an on-device environment by developing an Android app capable of executing compositional tasks seamlessly.
arXiv Detail & Related papers (2025-10-11T19:49:22Z)
Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning [33.57130798344366]
We propose integrating Task-Specific and Universal Adapters (TUNA) in this paper.<n> Specifically, we train task-specific adapters to capture the most crucial features relevant to their respective tasks.<n>We leverage an adapter fusion strategy to construct a universal adapter, which encodes the most discriminative features shared across tasks.
arXiv Detail & Related papers (2025-08-11T16:41:04Z)
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair [5.006064616335817]
Large Language Models (LLMs) have shown high capabilities in several software development-related tasks.<n> adapters offer a more efficient way to customize LLMs for particular needs.<n>Model (and adapter) merging have emerged as a technique to develop one model capable of multiple tasks.
arXiv Detail & Related papers (2024-08-18T18:45:48Z)
Towards Modular LLMs by Building and Reusing a Library of LoRAs [64.43376695346538]
We study how to best build a library of adapters given multi-task data. We introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters.
arXiv Detail & Related papers (2024-05-18T03:02:23Z)
Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models [12.230087530720652]
We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario. The adapter consists of a single shared controller network and multiple task-level adapter heads.
arXiv Detail & Related papers (2024-03-25T17:21:56Z)
Vision Transformer Adapters for Generalizable Multitask Learning [61.79647180647685]
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities. Our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added.
arXiv Detail & Related papers (2023-08-23T18:40:48Z)
Cross-Lingual Transfer with Target Language-Ready Task Adapters [66.5336029324059]
BAD-X, an extension of the MAD-X framework, achieves improved transfer at the cost of MAD-X's modularity. We aim to take the best of both worlds by fine-tuning task adapters adapted to the target language.
arXiv Detail & Related papers (2023-06-05T10:46:33Z)
Multi-Head Adapter Routing for Cross-Task Generalization [56.75667096355806]
Polytropon learns an inventory of adapters and a routing function that selects a subset of adapters for each task during both pre-training and few-shot adaptation. We find that routing is most beneficial during multi-task pre-training rather than during few-shot adaptation.
arXiv Detail & Related papers (2022-11-07T19:35:55Z)
AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z)
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.