Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization
- URL: http://arxiv.org/abs/2505.21895v1
- Date: Wed, 28 May 2025 02:15:15 GMT
- Title: Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization
- Authors: Cameron Gordon, Yiping Ji, Hemanth Saratchandran, Paul Albert, Simon Lucey,
- Abstract summary: Low-Rank Adaptation (LoRA) has become a standard approach for parameter-efficient fine-tuning.<n>We extend the sinusoidal transformation framework to quantized LoRA adapters.
- Score: 25.441086332799348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Low-Rank Adaptation (LoRA) has become a standard approach for parameter-efficient fine-tuning, offering substantial reductions in trainable parameters by modeling updates as the product of two low-rank matrices. While effective, the low-rank constraint inherently limits representational capacity, often resulting in reduced performance compared to full-rank fine-tuning. Recent work by Ji et al. (2025) has addressed this limitation by applying a fixed-frequency sinusoidal transformation to low-rank adapters, increasing their stable rank without introducing additional parameters. This raises a crucial question: can the same sine-activated technique be successfully applied within the context of Post-Training Quantization to retain benefits even after model compression? In this paper, we investigate this question by extending the sinusoidal transformation framework to quantized LoRA adapters. We develop a theoretical analysis showing that the stable rank of a quantized adapter is tightly linked to that of its full-precision counterpart, motivating the use of such rank-enhancing functions even under quantization. Our results demonstrate that the expressivity gains from a sinusoidal non-linearity persist after quantization, yielding highly compressed adapters with negligible loss in performance. We validate our approach across a range of fine-tuning tasks for language, vision and text-to-image generation achieving significant memory savings while maintaining competitive accuracy.
Related papers
- Enhancing Post-Training Quantization via Future Activation Awareness [84.76726857601753]
Post-training quantization (PTQ) is a widely used method to compress large language models (LLMs) without fine-tuning.<n>We propose Future-Aware Quantization (FAQ), which leverages future-layer activations to guide quantization.<n>FAQ consistently outperforms prior methods with negligible extra cost, requiring no backward passes, data reconstruction, or tuning.
arXiv Detail & Related papers (2026-01-28T12:03:30Z) - When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control [0.41998444721319217]
Quantum hardware suffers from intrinsic device heterogeneity and environmental drift.<n>We derive a scaling law lower bound for meta-learning showing that the adaptation gain saturates exponentially with gradient steps.<n>Further validation on classical linear-quadratic control confirms these laws emerge from general optimization geometry rather than quantum-specific physics.
arXiv Detail & Related papers (2026-01-26T21:16:11Z) - QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification [67.15451442018258]
Diffusion transformers exhibit remarkable video generation capability, yet their prohibitive computational and memory costs hinder practical deployment.<n>Model quantization and attention sparsification are two promising directions for compression, but each alone suffers severe performance degradation under aggressive compression.<n>We propose textbfQuantSparse, a unified framework that integrates model quantization with attention sparsification.
arXiv Detail & Related papers (2025-09-28T06:49:44Z) - QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models [14.492535012602625]
We propose a method that integrates FT-based adapters into quantized models by employing the Walsh-Hadamard Transform (WHT) as the transform kernel.<n>We demonstrate that QWHA effectively mitigates quantization errors while facilitating fine-tuning, and that its design substantially reduces computational cost.
arXiv Detail & Related papers (2025-09-22T07:21:41Z) - Enhancing Performance and Calibration in Quantile Hyperparameter Optimization [0.0]
Conformalized quantile regression can address these estimation weaknesses.<n>This study builds upon early work in this area.<n>Proposed algorithms are rigorously benchmarked.
arXiv Detail & Related papers (2025-09-21T12:17:06Z) - MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z) - OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters.<n>They often pose optimization challenges, with poor convergence.<n>We introduce an over- parameterized approach that accelerates training without increasing inference costs.<n>We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z) - ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers [7.155242379236052]
Quantization of Vision Transformers (ViTs) has emerged as a promising solution to mitigate these challenges.
Existing methods still suffer from significant accuracy loss at low-bit.
ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit.
arXiv Detail & Related papers (2024-07-03T02:41:59Z) - Efficient Learning With Sine-Activated Low-rank Matrices [25.12262017296922]
We propose a novel theoretical framework that integrates a sinusoidal function within the low-rank decomposition process.<n>Our method proves to be a plug in enhancement for existing low-rank models, as evidenced by its successful application in Vision Transformers (ViT), Large Language Models (LLMs), Neural Radiance Fields (NeRF) and 3D shape modelling.
arXiv Detail & Related papers (2024-03-28T08:58:20Z) - Low-Rank Tensor Completion via Novel Sparsity-Inducing Regularizers [30.920908325825668]
To alleviate l1-norm in the low-rank tensor completion problem, non-rank surrogates/regularizers have been suggested.
These regularizers are applied to nuclear-rank restoration, and efficient algorithms based on the method of multipliers are proposed.
arXiv Detail & Related papers (2023-10-10T01:00:13Z) - Randomized semi-quantum matrix processing [0.0]
We present a hybrid quantum-classical framework for simulating generic matrix functions.
The method is based on randomization over the Chebyshev approximation of the target function.
We prove advantages on average depths, including quadratic speed-ups on costly parameters.
arXiv Detail & Related papers (2023-07-21T18:00:28Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Teacher Intervention: Improving Convergence of Quantization Aware
Training for Ultra-Low Precision Transformers [17.445202457319517]
Quantization-aware training (QAT) is a promising method to lower the implementation cost and energy consumption.
This work proposes a proactive knowledge distillation method called Teacher Intervention (TI) for fast converging QAT of ultra-low precision pre-trained Transformers.
arXiv Detail & Related papers (2023-02-23T06:48:24Z) - HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer
Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions.
We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z) - NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization
for Vision Transformers [53.85087932591237]
NoisyQuant is a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers.
Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution.
NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead.
arXiv Detail & Related papers (2022-11-29T10:02:09Z) - Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution
Networks [82.18396309806577]
We propose a novel activation quantizer, referred to as Dynamic Dual Trainable Bounds (DDTB)
Our DDTB exhibits significant performance improvements in ultra-low precision.
For example, our DDTB achieves a 0.70dB PSNR increase on Urban100 benchmark when quantizing EDSR to 2-bit and scaling up output images to x4.
arXiv Detail & Related papers (2022-03-08T04:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.