Related papers: Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization

Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization

URL: http://arxiv.org/abs/2505.21895v1
Date: Wed, 28 May 2025 02:15:15 GMT
Title: Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization
Authors: Cameron Gordon, Yiping Ji, Hemanth Saratchandran, Paul Albert, Simon Lucey,
Abstract summary: Low-Rank Adaptation (LoRA) has become a standard approach for parameter-efficient fine-tuning.<n>We extend the sinusoidal transformation framework to quantized LoRA adapters.
Score: 25.441086332799348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-Rank Adaptation (LoRA) has become a standard approach for parameter-efficient fine-tuning, offering substantial reductions in trainable parameters by modeling updates as the product of two low-rank matrices. While effective, the low-rank constraint inherently limits representational capacity, often resulting in reduced performance compared to full-rank fine-tuning. Recent work by Ji et al. (2025) has addressed this limitation by applying a fixed-frequency sinusoidal transformation to low-rank adapters, increasing their stable rank without introducing additional parameters. This raises a crucial question: can the same sine-activated technique be successfully applied within the context of Post-Training Quantization to retain benefits even after model compression? In this paper, we investigate this question by extending the sinusoidal transformation framework to quantized LoRA adapters. We develop a theoretical analysis showing that the stable rank of a quantized adapter is tightly linked to that of its full-precision counterpart, motivating the use of such rank-enhancing functions even under quantization. Our results demonstrate that the expressivity gains from a sinusoidal non-linearity persist after quantization, yielding highly compressed adapters with negligible loss in performance. We validate our approach across a range of fine-tuning tasks for language, vision and text-to-image generation achieving significant memory savings while maintaining competitive accuracy.

Related papers

Enhancing Post-Training Quantization via Future Activation Awareness [84.76726857601753]
Post-training quantization (PTQ) is a widely used method to compress large language models (LLMs) without fine-tuning.<n>We propose Future-Aware Quantization (FAQ), which leverages future-layer activations to guide quantization.<n>FAQ consistently outperforms prior methods with negligible extra cost, requiring no backward passes, data reconstruction, or tuning.
arXiv Detail & Related papers (2026-01-28T12:03:30Z)
When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control [0.41998444721319217]
Quantum hardware suffers from intrinsic device heterogeneity and environmental drift.<n>We derive a scaling law lower bound for meta-learning showing that the adaptation gain saturates exponentially with gradient steps.<n>Further validation on classical linear-quadratic control confirms these laws emerge from general optimization geometry rather than quantum-specific physics.
arXiv Detail & Related papers (2026-01-26T21:16:11Z)
QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification [67.15451442018258]
Diffusion transformers exhibit remarkable video generation capability, yet their prohibitive computational and memory costs hinder practical deployment.<n>Model quantization and attention sparsification are two promising directions for compression, but each alone suffers severe performance degradation under aggressive compression.<n>We propose textbfQuantSparse, a unified framework that integrates model quantization with attention sparsification.
arXiv Detail & Related papers (2025-09-28T06:49:44Z)
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models [14.492535012602625]
We propose a method that integrates FT-based adapters into quantized models by employing the Walsh-Hadamard Transform (WHT) as the transform kernel.<n>We demonstrate that QWHA effectively mitigates quantization errors while facilitating fine-tuning, and that its design substantially reduces computational cost.
arXiv Detail & Related papers (2025-09-22T07:21:41Z)
Enhancing Performance and Calibration in Quantile Hyperparameter Optimization [0.0]
Conformalized quantile regression can address these estimation weaknesses.<n>This study builds upon early work in this area.<n>Proposed algorithms are rigorously benchmarked.
arXiv Detail & Related papers (2025-09-21T12:17:06Z)
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters.<n>They often pose optimization challenges, with poor convergence.<n>We introduce an over- parameterized approach that accelerates training without increasing inference costs.<n>We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z)
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers [7.155242379236052]
Quantization of Vision Transformers (ViTs) has emerged as a promising solution to mitigate these challenges. Existing methods still suffer from significant accuracy loss at low-bit. ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit.
arXiv Detail & Related papers (2024-07-03T02:41:59Z)
Efficient Learning With Sine-Activated Low-rank Matrices [25.12262017296922]
We propose a novel theoretical framework that integrates a sinusoidal function within the low-rank decomposition process.<n>Our method proves to be a plug in enhancement for existing low-rank models, as evidenced by its successful application in Vision Transformers (ViT), Large Language Models (LLMs), Neural Radiance Fields (NeRF) and 3D shape modelling.
arXiv Detail & Related papers (2024-03-28T08:58:20Z)
Low-Rank Tensor Completion via Novel Sparsity-Inducing Regularizers [30.920908325825668]
To alleviate l1-norm in the low-rank tensor completion problem, non-rank surrogates/regularizers have been suggested. These regularizers are applied to nuclear-rank restoration, and efficient algorithms based on the method of multipliers are proposed.
arXiv Detail & Related papers (2023-10-10T01:00:13Z)
Randomized semi-quantum matrix processing [0.0]
We present a hybrid quantum-classical framework for simulating generic matrix functions. The method is based on randomization over the Chebyshev approximation of the target function. We prove advantages on average depths, including quadratic speed-ups on costly parameters.
arXiv Detail & Related papers (2023-07-21T18:00:28Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers [17.445202457319517]
Quantization-aware training (QAT) is a promising method to lower the implementation cost and energy consumption. This work proposes a proactive knowledge distillation method called Teacher Intervention (TI) for fast converging QAT of ultra-low precision pre-trained Transformers.
arXiv Detail & Related papers (2023-02-23T06:48:24Z)
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions. We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z)
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers [53.85087932591237]
NoisyQuant is a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers. Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution. NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead.
arXiv Detail & Related papers (2022-11-29T10:02:09Z)
Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks [82.18396309806577]
We propose a novel activation quantizer, referred to as Dynamic Dual Trainable Bounds (DDTB) Our DDTB exhibits significant performance improvements in ultra-low precision. For example, our DDTB achieves a 0.70dB PSNR increase on Urban100 benchmark when quantizing EDSR to 2-bit and scaling up output images to x4.
arXiv Detail & Related papers (2022-03-08T04:26:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.