Related papers: OP-LoRA: The Blessing of Dimensionality

OP-LoRA: The Blessing of Dimensionality

URL: http://arxiv.org/abs/2412.10362v1
Date: Fri, 13 Dec 2024 18:55:19 GMT
Title: OP-LoRA: The Blessing of Dimensionality
Authors: Piotr Teterwak, Kate Saenko, Bryan A. Plummer, Ser-Nam Lim,
Abstract summary: Low-rank adapters enable fine-tuning of large models with only a small number of parameters.<n>They often pose optimization challenges, with poor convergence.<n>We introduce an over- parameterized approach that accelerates training without increasing inference costs.<n>We achieve improvements in vision-language tasks and especially notable increases in image generation.
Score: 93.08208871549557
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-rank adapters enable fine-tuning of large models with only a small number of parameters, thus reducing storage costs and minimizing the risk of catastrophic forgetting. However, they often pose optimization challenges, with poor convergence. To overcome these challenges, we introduce an over-parameterized approach that accelerates training without increasing inference costs. This method reparameterizes low-rank adaptation by employing a separate MLP and learned embedding for each layer. The learned embedding is input to the MLP, which generates the adapter parameters. Such overparamaterization has been shown to implicitly function as an adaptive learning rate and momentum, accelerating optimization. At inference time, the MLP can be discarded, leaving behind a standard low-rank adapter. To study the effect of MLP overparameterization on a small yet difficult proxy task, we implement it for matrix factorization, and find it achieves faster convergence and lower final loss. Extending this approach to larger-scale tasks, we observe consistent performance gains across domains. We achieve improvements in vision-language tasks and especially notable increases in image generation, with CMMD scores improving by up to 15 points.

Related papers

Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity [21.090365337326414]
Finetuning foundation language models (LMs) with billions of parameters is often impractical due to high computational costs, memory requirements, and the risk of overfitting.<n>We propose a scheme for effective finetuning via sparsification using training gates, which requires minimal trainable parameters.<n> Empirical results show it outperforms recent finetuning baselines in efficiency and performance.
arXiv Detail & Related papers (2026-02-09T20:20:29Z)
Layer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language Models [29.984541536225123]
Post-training quantization (PTQ) has emerged as a promising approach to mitigate these challenges with minimal overhead.<n>Existing PTQ methods experience substantial accuracy loss at extremely low bit-widths.<n>We propose a quadratic optimization framework that determines layer-specific ratios of high-impact parameters.
arXiv Detail & Related papers (2025-11-21T21:47:39Z)
Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning [44.747749293948864]
This work explores pruning to improve efficiency while maintaining recommendation quality.<n>We propose a more fine-grained pruning approach that integrates both intra-layer and layer-wise pruning.<n>Our approach achieves an average of 88% of the original model's performance while pruning more than 95% of the non-embedding parameters.
arXiv Detail & Related papers (2025-07-09T17:26:10Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
As language models grow in size, memory demands for backpropagation increase. Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative. We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
arXiv Detail & Related papers (2024-10-11T17:01:43Z)
LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.<n>We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.<n>Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning [22.950914612765494]
Fine-tuning large language models (LLMs) has achieved remarkable performance across various natural language processing tasks. Memory-efficient Zeroth-order (MeZO) methods attempt to fine-tune LLMs using only forward passes, thereby avoiding the need for a backpropagation graph. We propose the Adaptive Zeroth-order-Train Adaption (AdaZeta) framework, specifically designed to improve the performance and convergence of the ZO methods.
arXiv Detail & Related papers (2024-06-26T04:33:13Z)
LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation [7.788139145984213]
Low-rank adaptation (LoRA) has become the default approach to fine-tune large language models (LLMs) We introduce large model fine-tuning via spectrally decomposed low-dimensional adaptation (LaMDA) LaMDA achieves significant reductions in trainable parameters and peak GPU memory footprint.
arXiv Detail & Related papers (2024-06-18T17:52:59Z)
Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision. We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation. Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z)
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models [79.34513906324727]
In this paper, we aim at parameter and efficient transfer learning (PCETL) for vision-language pre-trained models. We propose a novel dynamic architecture skipping (DAS) approach towards effective PCETL.
arXiv Detail & Related papers (2023-09-04T09:34:33Z)
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs) LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner. LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.