Related papers: LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

URL: http://arxiv.org/abs/2507.01806v1
Date: Wed, 02 Jul 2025 15:24:47 GMT
Title: LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs
Authors: Reza Arabpour, Haitz Sáez de Ocáriz Borde, Anastasis Kratsios,
Abstract summary: Low-Rank Adapters (LoRAs) have transformed the fine-tuning of Large Language Models (LLMs) by enabling parameter-efficient updates.<n>We propose a theoretically grounded approach to LoRA fine-tuning designed specifically for users with limited computational resources.
Score: 8.397730500554047
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-Rank Adapters (LoRAs) have transformed the fine-tuning of Large Language Models (LLMs) by enabling parameter-efficient updates. However, their widespread adoption remains limited by the reliance on GPU-based training. In this work, we propose a theoretically grounded approach to LoRA fine-tuning designed specifically for users with limited computational resources, particularly those restricted to standard laptop CPUs. Our method learns a meta-operator that maps any input dataset, represented as a probability distribution, to a set of LoRA weights by leveraging a large bank of pre-trained adapters for the Mistral-7B-Instruct-v0.2 model. Instead of performing new gradient-based updates, our pipeline constructs adapters via lightweight combinations of existing LoRAs directly on CPU. While the resulting adapters do not match the performance of GPU-trained counterparts, they consistently outperform the base Mistral model on downstream tasks, offering a practical and accessible alternative to traditional GPU-based fine-tuning.

Related papers

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts [72.22148263683037]
We study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures.<n>First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature.<n>Next, we investigate the merging properties of these sparse adapters by merging adapters for up to 20 natural language processing tasks.
arXiv Detail & Related papers (2025-07-09T03:25:45Z)
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights [75.83625828306839]
textbfDrag-and-Drop LLMs (textitDnD) eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates.<n>A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyper-convolutional decoder into the full set of LoRA matrices.
arXiv Detail & Related papers (2025-06-19T15:38:21Z)
Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks. Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum. We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z)
MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter [40.616849959987555]
We introduce a novel mechanism that fine-tunes Large Language Models (LLMs) with adapters of larger size yet memory-efficient. This is achieved by leveraging the inherent activation sparsity in the Feed-Forward Networks (FFNs) of LLMs. We employ a Mixture of Experts (MoE)-like architecture to mitigate unnecessary CPU computations and reduce the communication volume between the GPU and CPU.
arXiv Detail & Related papers (2024-06-07T14:49:22Z)
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts [3.6301530893494127]
MixLoRA is an approach to construct a resource-efficient sparse MoE model based on LoRA. Our evaluations show that MixLoRA improves about 9% accuracy compared to state-of-the-art PEFT methods in multi-task learning scenarios.
arXiv Detail & Related papers (2024-04-22T02:15:52Z)
Training Neural Networks from Scratch with Parallel Low-Rank Adapters [46.764982726136054]
We introduce LoRA-the-Explorer (LTE), a novel bi-level optimization algorithm designed to enable parallel training of multiple low-rank heads across computing nodes. Our approach includes extensive experimentation on vision transformers using various vision datasets, demonstrating that LTE is competitive with standard pre-training.
arXiv Detail & Related papers (2024-02-26T18:55:13Z)
Run LoRA Run: Faster and Lighter LoRA Implementations [50.347242693025336]
LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers. This paper presents the RunLoRA framework for efficient implementations of LoRA. Experiments show up to 28% speedup on language modeling networks.
arXiv Detail & Related papers (2023-12-06T10:54:34Z)
mLoRA: Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs [5.735411578779657]
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is commonly used to adapt a base LLM to multiple downstream tasks. LoRA platforms enable developers to fine-tune multiple models and develop various domain-specific applications simultaneously. Existing model parallelism schemes suffer from high communication overhead and inefficient GPU utilization when training multiple LoRA tasks.
arXiv Detail & Related papers (2023-12-05T05:38:38Z)
S-LoRA: Serving Thousands of Concurrent LoRA Adapters [59.490751234925206]
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks. We present S-LoRA, a system designed for the scalable serving of many LoRA adapters.
arXiv Detail & Related papers (2023-11-06T17:26:17Z)
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU. This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z)
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs) LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner. LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.