Related papers: SALAAD: Sparse And Low-Rank Adaptation via ADMM for Large Language Model Inference

SALAAD: Sparse And Low-Rank Adaptation via ADMM for Large Language Model Inference

URL: http://arxiv.org/abs/2602.00942v2
Date: Fri, 06 Feb 2026 12:13:56 GMT
Title: SALAAD: Sparse And Low-Rank Adaptation via ADMM for Large Language Model Inference
Authors: Hao Ma, Melis Ilayda Bal, Liang Zhang, Bingcong Li, Niao He, Melanie Zeilinger, Michael Muehlebach,
Abstract summary: We propose SALAAD, a plug-and-play framework that induces sparse and low-rank structures during training.<n>Experiments across model scales show that SALAAD substantially reduces memory consumption during deployment.<n>A single training run yields a continuous spectrum of model capacities, enabling smooth and elastic deployment across diverse memory budgets.
Score: 38.037874715181964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern large language models are increasingly deployed under compute and memory constraints, making flexible control of model capacity a central challenge. While sparse and low-rank structures naturally trade off capacity and performance, existing approaches often rely on heuristic designs that ignore layer and matrix heterogeneity or require model-specific architectural modifications. We propose SALAAD, a plug-and-play framework applicable to different model architectures that induces sparse and low-rank structures during training. By formulating structured weight learning under an augmented Lagrangian framework and introducing an adaptive controller that dynamically balances the training loss and structural constraints, SALAAD preserves the stability of standard training dynamics while enabling explicit control over the evolution of effective model capacity during training. Experiments across model scales show that SALAAD substantially reduces memory consumption during deployment while achieving performance comparable to ad-hoc methods. Moreover, a single training run yields a continuous spectrum of model capacities, enabling smooth and elastic deployment across diverse memory budgets without the need for retraining.

Related papers

Modular Memory is the Key to Continual Learning Agents [100.09688599754465]
We argue that combining the strengths of In-Weight Learning (IWL) and the newly emerged capabilities of In-Context Learning (ICL) through the design of modular memory is the missing piece for continual adaptation at scale.<n>We outline a conceptual framework for modular memory-centric architectures that leverage ICL for rapid adaptation and knowledge accumulation, and IWL for stable updates to model capabilities.
arXiv Detail & Related papers (2026-03-02T11:40:05Z)
Beyond Parameter Arithmetic: Sparse Complementary Fusion for Distribution-Aware Model Merging [20.429700094073684]
We propose Sparse Complementary Fusion with reverse KL (SCF-RKL), a novel model merging framework that explicitly controls functional interference through sparse, distribution-aware updates.<n>We evaluate SCF-RKL across a wide range of model scales and architectures, covering both reasoning-focused and instruction-tuned models.
arXiv Detail & Related papers (2026-02-12T08:45:42Z)
An Integrated Fusion Framework for Ensemble Learning Leveraging Gradient Boosting and Fuzzy Rule-Based Models [59.13182819190547]
Fuzzy rule-based models excel in interpretability and have seen widespread application across diverse fields.<n>They face challenges such as complex design specifications and scalability issues with large datasets.<n>This paper proposes an Integrated Fusion Framework that merges the strengths of both paradigms to enhance model performance and interpretability.
arXiv Detail & Related papers (2025-11-11T10:28:23Z)
Rethinking the Role of Dynamic Sparse Training for Scalable Deep Reinforcement Learning [58.533203990515034]
Scaling neural networks has driven breakthrough advances in machine learning, yet this paradigm fails in deep reinforcement learning (DRL)<n>We show that dynamic sparse training strategies provide module-specific benefits that complement the primary scalability foundation established by architectural improvements.<n>We finally distill these insights into Module-Specific Training (MST), a practical framework that exploits the benefits of architectural improvements and demonstrates substantial scalability gains across diverse RL algorithms without algorithmic modifications.
arXiv Detail & Related papers (2025-10-14T03:03:08Z)
The Curious Case of In-Training Compression of State Space Models [49.819321766705514]
State Space Models (SSMs) tackle long sequence modeling tasks efficiently, offer both parallelizable training and fast inference.<n>Key design challenge is striking the right balance between maximizing expressivity and limiting this computational burden.<n>Our approach, textscCompreSSM, applies to Linear Time-Invariant SSMs such as Linear Recurrent Units, but is also extendable to selective models.
arXiv Detail & Related papers (2025-10-03T09:02:33Z)
Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition [0.0]
Existing approaches to model merging and continual learning often suffer from task interference, catastrophic forgetting, or lack of reversibility.<n>We propose Modular Delta Merging with Orthogonal Constraints (MDM-OC), a novel framework that enables scalable, interference-free, and composition of fine-tuned models.
arXiv Detail & Related papers (2025-07-28T17:08:49Z)
Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z)
Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments [0.0]
We investigate whether reinforcement learning (RL)-based model management can manage deployment decisions more effectively.<n>Our approach enables more adaptive production environments by continuously evaluating deployed models and rolling back underperforming ones in real-time.<n>Our findings suggest that RL-based model management can improve automation, reduce reliance on manual interventions, and mitigate risks associated with post-deployment model failures.
arXiv Detail & Related papers (2025-03-28T16:42:21Z)
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [115.79349923044663]
Few-shot class-incremental learning (FSCIL) aims to incrementally learn novel classes from limited examples.<n>Existing methods face a critical dilemma: static architectures rely on a fixed parameter space to learn from data that arrive sequentially, prone to overfitting to the current session.<n>In this study, we explore the potential of Selective State Space Models (SSMs) for FSCIL.
arXiv Detail & Related papers (2024-07-08T17:09:39Z)
Learning a model is paramount for sample efficiency in reinforcement learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system. We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z)
State-driven Implicit Modeling for Sparsity and Robustness in Neural Networks [3.604879434384177]
We present a new approach to training implicit models, called State-driven Implicit Modeling (SIM) SIM constrains the internal states and outputs to match that of a baseline model, circumventing costly backward computations. We demonstrate how the SIM approach can be applied to significantly improve sparsity and robustness of baseline models trained on datasets.
arXiv Detail & Related papers (2022-09-19T23:58:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.