PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
- URL: http://arxiv.org/abs/2410.10870v3
- Date: Sat, 29 Mar 2025 03:32:53 GMT
- Title: PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
- Authors: Rana Muhammad Shahroz Khan, Pingzhi Li, Sukwon Yun, Zhenyu Wang, Shahriar Nirjon, Chau-Wai Wong, Tianlong Chen,
- Abstract summary: PortLLM is a training-free framework that creates an initial lightweight model update patch to capture domain-specific knowledge.<n>PortLLM achieves comparable performance to LoRA fine-tuning with reductions of up to 12.2x in GPU memory usage.
- Score: 34.65386386598757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) increasingly shape the AI landscape, fine-tuning pretrained models has become more popular than in the pre-LLM era for achieving optimal performance in domain-specific tasks. However, pretrained LLMs such as ChatGPT are periodically evolved, i.e., model parameters are frequently updated), making it challenging for downstream users with limited resources to keep up with fine-tuning the newest LLMs for their domain application. Even though fine-tuning costs have nowadays been reduced thanks to the innovations of parameter-efficient fine-tuning such as LoRA, not all downstream users have adequate computing for frequent personalization. Moreover, access to fine-tuning datasets, particularly in sensitive domains such as healthcare, could be time-restrictive, making it crucial to retain the knowledge encoded in earlier fine-tuned rounds for future adaptation. In this paper, we present PortLLM, a training-free framework that (i) creates an initial lightweight model update patch to capture domain-specific knowledge, and (ii) allows a subsequent seamless plugging for the continual personalization of evolved LLM at minimal cost. Our extensive experiments cover seven representative datasets, from easier question-answering tasks {BoolQ, SST2} to harder reasoning tasks {WinoGrande, GSM8K}, and models including {Mistral-7B, Llama2, Llama3.1, and Gemma2}, validating the portability of our designed model patches and showcasing the effectiveness of our proposed framework. For instance, PortLLM achieves comparable performance to LoRA fine-tuning with reductions of up to 12.2x in GPU memory usage. Finally, we provide theoretical justifications to understand the portability of our model update patches, which offers new insights into the theoretical dimension of LLMs' personalization.
Related papers
- Don't Start Over: A Cost-Effective Framework for Migrating Personalized Prompts Between LLMs [51.79252689855809]
Personalization in Large Language Models (LLMs) often relies on user-specific soft prompts.<n>We propose the Prompt-level User Migration Adapter (PUMA), a framework to efficiently migrate personalized prompts across incompatible models.<n>Experiments on three large-scale datasets show our method matches or even surpasses the performance of retraining from scratch, reducing computational cost by up to 98%.
arXiv Detail & Related papers (2026-01-17T12:30:31Z) - CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling [60.55856973678002]
Large Reasoning Models (LRMs) have demonstrated strong capabilities in complex multi-step reasoning.<n>Existing domain adaptation methods, originally designed for earlier instruction-tuned models, often fail to exploit the advanced reasoning patterns of modern LRMs.<n>We propose textbfCALM, a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks.
arXiv Detail & Related papers (2025-10-05T13:38:31Z) - PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing [48.30406812516552]
We introduce the PLM, a Peripheral Language Model, developed through a co-design process that jointly optimize model architecture and edge system constraints.
PLM employs a Multi-head Latent Attention mechanism and employs the squared ReLU activation function to encourage sparsity, thereby reducing peak memory footprint.
evaluation results demonstrate that PLM outperforms existing small language models trained on publicly available data.
arXiv Detail & Related papers (2025-03-15T15:11:17Z) - EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models [25.058673320372677]
Large language models (LLMs) driven by scaling laws, achieve intelligence emergency in large model sizes.
This work proposes the pruning-aware pretraining, focusing on retaining performance of much larger optimized models.
We reveal that it achieves top-quality edge language models, termed EfficientLLM, by scaling up LLM compression and extending its boundary.
arXiv Detail & Related papers (2025-02-10T16:51:03Z) - How to Alleviate Catastrophic Forgetting in LLMs Finetuning? Hierarchical Layer-Wise and Element-Wise Regularization [15.434072331989878]
Large Language Models (LLMs) exhibit strong general language capabilities.
Fine-tuning these models on domain-specific tasks often leads to catastrophic forgetting, where the model overwrites or loses essential knowledge acquired during pretraining.
We propose a novel approach to compute the element-wise importance of model parameters crucial for preserving general knowledge during fine-tuning.
arXiv Detail & Related papers (2025-01-23T13:54:53Z) - Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs [76.40876036912537]
Large Language Models (LLMs) demonstrate strong few-shot adaptability without requiring fine-tuning.
Current Visual Foundation Models (VFMs) require explicit fine-tuning with sufficient tuning data.
We propose a framework, LoRA Recycle, that distills a meta-LoRA from diverse pre-tuned LoRAs with a meta-learning objective.
arXiv Detail & Related papers (2024-12-03T07:25:30Z) - Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models [38.197552424549514]
Low-rank adaptations (LoRAs) have revolutionized the finetuning of large foundation models.
LoRAs present opportunities for applying machine learning techniques that take these low-rank weights themselves as inputs.
In this paper, we investigate the potential of Learning on LoRAs (LoL), a paradigm where LoRA weights serve as input to machine learning models.
arXiv Detail & Related papers (2024-10-05T15:52:47Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - PAT: Pruning-Aware Tuning for Large Language Models [19.622152991641045]
Large language models excel in language tasks, especially with supervised fine-tuning after pre-training.
Traditional post-hoc pruning often leads to significant performance loss.
We propose the Pruning-Aware Tuning (PAT) paradigm to eliminate model redundancy.
arXiv Detail & Related papers (2024-08-27T01:04:14Z) - Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches [64.42735183056062]
Large language models (LLMs) have transitioned from specialized models to versatile foundation models.
LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment.
arXiv Detail & Related papers (2024-08-20T09:42:17Z) - Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation [50.837277466987345]
We focus on the field of large language models (LLMs) for recommendation.
We propose RecLoRA, which incorporates a Personalized LoRA module that maintains independent LoRAs for different users.
We also design a Few2Many Learning Strategy, using a conventional recommendation model as a lens to magnify small training spaces to full spaces.
arXiv Detail & Related papers (2024-08-07T04:20:28Z) - Pruning Large Language Models with Semi-Structural Adaptive Sparse Training [17.381160429641316]
Adaptive Sparse Trainer (AST) is a novel and efficient retraining framework tailored for semi-structured sparse models.
AST reduces the perplexity and zero-shot accuracy gap between dense and 2:4 semi-structured sparse models to 0.6 and 1.16%, respectively.
arXiv Detail & Related papers (2024-07-30T06:33:44Z) - ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling [15.673219028826173]
We introduce a semi-automated data synthesis framework designed for optimization modeling issues, named OR-Instruct.
We train various open-source LLMs with a capacity of 7 billion parameters (dubbed ORLMs)
The resulting model demonstrates significantly enhanced optimization modeling capabilities, achieving state-of-the-art performance across the NL4OPT, MAMO, and IndustryOR benchmarks.
arXiv Detail & Related papers (2024-05-28T01:55:35Z) - MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT [87.4910758026772]
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development.
This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
arXiv Detail & Related papers (2024-02-26T18:59:03Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources [37.265708531464746]
Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks.
Fine-tuning these pre-trained models on downstream datasets provides further significant performance gains, but this process has been challenging due to its extraordinary resource requirements.
We propose QFT, a novel Quantized Full- parameter Tuning framework for LLMs that enables memory-efficient fine-tuning without harming performance.
arXiv Detail & Related papers (2023-10-11T02:47:40Z) - Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning [52.29522018586365]
We study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models.
Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains.
arXiv Detail & Related papers (2023-10-10T15:13:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.