3BASiL: An Algorithmic Framework for Sparse plus Low-Rank Compression of LLMs
- URL: http://arxiv.org/abs/2603.01376v1
- Date: Mon, 02 Mar 2026 02:16:46 GMT
- Title: 3BASiL: An Algorithmic Framework for Sparse plus Low-Rank Compression of LLMs
- Authors: Mehdi Makni, Xiang Meng, Rahul Mazumder,
- Abstract summary: We introduce 3BASiL-TM, an efficient one-shot post-training method for $(mathbfS + mathbfLR)$ decomposition of Large Language Models.<n>Our experiments show that 3BASiL-TM reduces the WikiText2 perplexity gap relative to dense LLaMA-8B model by over 30% under a (2:4 Sparse + 64 LR) configuration.<n>Our method achieves over 2.5x faster compression runtime on an A100 GPU compared to SOTA $(mathbfS + mathbfLR)
- Score: 20.28912929805946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse plus Low-Rank $(\mathbf{S} + \mathbf{LR})$ decomposition of Large Language Models (LLMs) has emerged as a promising direction in model compression, aiming to decompose pre-trained model weights into a sum of sparse and low-rank matrices $(\mathbf{W} \approx \mathbf{S} + \mathbf{LR})$. Despite recent progress, existing methods often suffer from substantial performance degradation compared to dense models. In this work, we introduce 3BASiL-TM, an efficient one-shot post-training method for $(\mathbf{S} + \mathbf{LR})$ decomposition of LLMs that addresses this gap. Our approach first introduces a novel 3-Block Alternating Direction Method of Multipliers (ADMM) method, termed 3BASiL, to minimize the layer-wise reconstruction error with convergence guarantees. We then design an efficient transformer-matching (TM) refinement step that jointly optimizes the sparse and low-rank components across transformer layers. This step minimizes a novel memory-efficient loss that aligns outputs at the transformer level. Notably, the TM procedure is universal as it can enhance any $(\mathbf{S} + \mathbf{LR})$ decomposition, including pure sparsity. Our numerical experiments show that 3BASiL-TM reduces the WikiText2 perplexity gap relative to dense LLaMA-8B model by over 30% under a (2:4 Sparse + 64 LR) configuration, compared to prior methods. Moreover, our method achieves over 2.5x faster compression runtime on an A100 GPU compared to SOTA $(\mathbf{S} + \mathbf{LR})$ method. Our code is available at https://github.com/mazumder-lab/3BASiL.
Related papers
- Evolution Strategies at the Hyperscale [57.75314521465674]
We introduce EGGROLL, an evolution strategies (ES) algorithm designed to scale backprop-free optimization to large population sizes.<n>ES is a set of powerful blackbox optimisation methods that can handle non-differentiable or noisy objectives.<n>EGGROLL overcomes these bottlenecks by generating random matrices $Ain mathbbRmtimes r, Bin mathbbRntimes r$ with $rll min(m,n)$ to form a low-rank matrix perturbation $A Btop$
arXiv Detail & Related papers (2025-11-20T18:56:05Z) - StelLA: Subspace Learning in Low-rank Adaptation using Stiefel Manifold [51.93627542334909]
Low-rank adaptation (LoRA) has been widely adopted as a parameter-efficient technique for fine-tuning large-scale pre-trained models.<n>We propose a geometry-aware extension of LoRA that uses a three-factor decomposition $U!SVtop$.
arXiv Detail & Related papers (2025-10-02T11:59:13Z) - A3 : an Analytical Low-Rank Approximation Framework for Attention [14.649496050074735]
We propose $tt Attt 3$, a post-training low-rank approximation framework.<n>We show that $tt Attt 3$ maintains superior performance compared to SoTAs.<n>We also demonstrate the versatility of $tt Att 3$, including KV cache compression, quantization, and mixed-rank assignments for enhanced performance.
arXiv Detail & Related papers (2025-05-19T10:29:32Z) - Compressing Large Language Models using Low Rank and Low Precision Decomposition [46.30918750022739]
This work introduces $rm CALDERA$ -- a new post-training LLM compression algorithm.
It harnesses the inherent low-rank structure of a weight matrix $mathbfW$ by approximating it via a low-rank, low-precision decomposition.
Results show that compressing LlaMa-$2$ $7$B/$13B$/$70$B and LlaMa-$3$ $8$B models using $rm CALDERA$ outperforms existing post-training compression techniques.
arXiv Detail & Related papers (2024-05-29T08:42:30Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Oblivious Stochastic Composite Optimization [47.48197617884748]
We show that our algorithms converge without any prior knowledge on the parameters of the problem.<n>All three algorithms work without prior knowledge of the diameter of the feasible set, the Lipschitz constant or smoothness of the objective function.<n>We extend our framework to relative scale and demonstrate the efficiency and robustness of our methods on large-scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement
Learning with Latent Low-Rank Structure [9.759209713196718]
We consider a class of MDPs for which the associated optimal $Q*$ function is low rank, where the latent features are unknown.
We show that under stronger low rank structural assumptions, given access to a generative model, Low Rank Monte Carlo Policy Iteration (LR-MCPI) and Low Rank Empirical Value Iteration (LR-EVI) achieve the desired sample complexity of $tildeOleft((|S|+|A|)mathrmpoly(d,H)/epsilon2right)$ for a rank
arXiv Detail & Related papers (2022-06-07T20:39:51Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.