Related papers: Large Language Model Compression with Global Rank and Sparsity Optimization

Large Language Model Compression with Global Rank and Sparsity Optimization

URL: http://arxiv.org/abs/2505.03801v1
Date: Fri, 02 May 2025 08:00:48 GMT
Title: Large Language Model Compression with Global Rank and Sparsity Optimization
Authors: Changhai Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin,
Abstract summary: Low-rank and sparse composite approximation is a natural idea to compress Large Language Models.<n>We propose a novel two-stage compression method with the capability of global rank and sparsity optimization.<n>Our method significantly surpasses state-of-the-art techniques for sparsification and composite approximation.
Score: 12.078838412963083
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-rank and sparse composite approximation is a natural idea to compress Large Language Models (LLMs). However, such an idea faces two primary challenges that adversely affect the performance of existing methods. The first challenge relates to the interaction and cooperation between low-rank and sparse matrices, while the second involves determining weight allocation across different layers, as redundancy varies considerably among them. To address these challenges, we propose a novel two-stage LLM compression method with the capability of global rank and sparsity optimization. It is noteworthy that the overall optimization space is vast, making comprehensive optimization computationally prohibitive. Therefore, to reduce the optimization space, our first stage utilizes robust principal component analysis to decompose the weight matrices of LLMs into low-rank and sparse components, which span the low dimensional and sparse spaces containing the resultant low-rank and sparse matrices, respectively. In the second stage, we propose a probabilistic global optimization technique to jointly identify the low-rank and sparse structures within the above two spaces. The appealing feature of our approach is its ability to automatically detect the redundancy across different layers and to manage the interaction between the sparse and low-rank components. Extensive experimental results indicate that our method significantly surpasses state-of-the-art techniques for sparsification and composite approximation.

Related papers

Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing [0.0]
Low-rank decompositions of Large Language Models (LLMs) are very demanding in terms of their computational resources.<n>We present two physics-inspired improvements to SVD compression: textbfFermiGrad, a gradient-descent algorithm that determines globally optimal layer-wise ranks, and textbfPivGa, an additional textitlossless compression of the low-rank factors.
arXiv Detail & Related papers (2025-11-26T10:54:01Z)
Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization [66.08522228989634]
We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV)<n>This is achieved by adopting a lifted perspective through mean-field Langevin dynamics (MFLD)
arXiv Detail & Related papers (2025-11-18T17:51:17Z)
Near-optimal Linear Predictive Clustering in Non-separable Spaces via Mixed Integer Programming and Quadratic Pseudo-Boolean Reductions [21.80447518126464]
Linear Predictive Clustering (LPC) partitions samples based on shared linear relationships between feature and target variables.<n>Greedy optimization methods, commonly used for LPC, alternate between clustering and linear regression but lack global optimality.<n>This work builds on the constrained optimization paradigm to introduce two novel approaches that improve the efficiency of global optimization for LPC.
arXiv Detail & Related papers (2025-11-13T21:22:47Z)
1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models [15.798945727818753]
We introduce underlineSynergistic underlineSparse and underlineCompression (SSLC) methods for Large Language Models (LLMs)<n>Low-rank approximation compresses the model by retaining its essential structure with minimal information loss, whereas sparse optimization eliminates non-essential weights, preserving those crucial for generalization.<n>Experiments on LLaMA and Qwen2.5 models (7B-70B) show that SSLC, without any additional training steps, consistently surpasses standalone methods, achieving state-of-the-arts results.
arXiv Detail & Related papers (2025-10-30T12:50:30Z)
Discrete-Guided Diffusion for Scalable and Safe Multi-Robot Motion Planning [56.240199425429445]
Multi-Robot Motion Planning (MPMP) involves generating trajectories for multiple robots operating in a shared continuous workspace.<n>While discrete multi-agent finding (MAPF) methods are broadly adopted due to their scalability, their coarse discretization trajectory quality.<n>This paper tackles limitations of two approaches by introducing discrete MAPF solvers with constrained generative diffusion models.
arXiv Detail & Related papers (2025-08-27T17:59:36Z)
L-SR1: Learned Symmetric-Rank-One Preconditioning [5.421390145168128]
End-to-end deep learning has achieved impressive results but remains limited by its reliance on large labeled datasets.<n>In contrast, classical optimization methods are data-efficient and lightweight but often suffer from slow convergence.<n>We propose a novel learned second-order vectors that introduces a trainable preconditioning unit to enhance the classical Symmetric-Rank-One algorithm.
arXiv Detail & Related papers (2025-08-17T07:37:29Z)
LLM4CMO: Large Language Model-aided Algorithm Design for Constrained Multiobjective Optimization [54.35609820607923]
Large language models (LLMs) offer new opportunities for assisting with algorithm design.<n>We propose LLM4CMO, a novel CMOEA based on a dual-population, two-stage framework.<n>LLMs can serve as efficient co-designers in the development of complex evolutionary optimization algorithms.
arXiv Detail & Related papers (2025-08-16T02:00:57Z)
MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression [2.9907287985468924]
Mixed Low-Rank and Quantization (MLoRQ) is a novel method that integrates low-rank approximation and mixed-precision quantization.<n>MLoRQ shows state-of-the-art results with up to 15% performance improvement.
arXiv Detail & Related papers (2025-07-13T12:48:46Z)
MGAA: Multi-Granular Adaptive Allocation fof Low-Rank Compression of LLMs [9.244526043014098]
Multi-Granular Adaptive Allocation (MGAA) method can adaptively allocate parameters between and within sublayers without task-specific evaluations in the compression process.<n> Comprehensive evaluations of MGAA across multiple LLMs backbone models and benchmark datasets demonstrate its superior performance.
arXiv Detail & Related papers (2025-07-04T04:54:01Z)
A Gradient Meta-Learning Joint Optimization for Beamforming and Antenna Position in Pinching-Antenna Systems [63.213207442368294]
We consider a novel optimization design for multi-waveguide pinching-antenna systems.<n>The proposed GML-JO algorithm is robust to different choices and better performance compared with the existing optimization methods.
arXiv Detail & Related papers (2025-06-14T17:35:27Z)
NDCG-Consistent Softmax Approximation with Accelerated Convergence [67.10365329542365]
We propose novel loss formulations that align directly with ranking metrics.<n>We integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method.<n> Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance.
arXiv Detail & Related papers (2025-06-11T06:59:17Z)
Highly Efficient and Effective LLMs with Multi-Boolean Architectures [1.4195677954898822]
Weight binarization has emerged as a promising strategy to drastically reduce the complexity of large language models (LLMs)<n>We introduce a novel framework that effectively transforms LLMs into multi- kernel Boolean parameters, for the first time, finetunes them directly in the Boolean domain, eliminating the need for expensive latent weights.<n>Our method outperforms recent ultra low-bit quantization and binarization methods.
arXiv Detail & Related papers (2025-05-28T19:40:34Z)
Scalable Min-Max Optimization via Primal-Dual Exact Pareto Optimization [66.51747366239299]
We propose a smooth variant of the min-max problem based on the augmented Lagrangian.<n>The proposed algorithm scales better with the number of objectives than subgradient-based strategies.
arXiv Detail & Related papers (2025-03-16T11:05:51Z)
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs [81.01082659623552]
Large Language Models (LLMs) have demonstrated remarkable success across various domains.<n>Their optimization remains a significant challenge due to the complex and high-dimensional loss landscapes they inhabit.
arXiv Detail & Related papers (2025-02-24T18:42:19Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps [4.471962177124311]
In distributed machine learning, linear variables across multiple agents with different data poses significant challenges. In this paper we show that a framework that achieves the Lagrangian convergence on the primal variable requires no inter-agent communication.
arXiv Detail & Related papers (2024-07-02T22:14:54Z)
Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization [42.53133823994923]
Low-rank compression is a promising technique to reduce non-essential parameters in large language models.<n>We conduct empirical research on the low-rank characteristics of large models.<n>We propose a low-rank compression method suitable for large language models.
arXiv Detail & Related papers (2024-05-17T08:27:12Z)
Low-Rank Prune-And-Factorize for Language Model Compression [18.088550230146247]
Matrix factorization fails to retain satisfactory performance under moderate to high compression rate. We propose two techniques: sparsity-aware SVD and mixed-rank fine-tuning.
arXiv Detail & Related papers (2023-06-25T07:38:43Z)
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX. textttMEX integrates estimation and planning components while balancing exploration exploitation automatically. It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z)
A consistent and flexible framework for deep matrix factorizations [17.49766938060264]
We introduce two meaningful loss functions for deep MF and present a generic framework to solve the corresponding optimization problems. The models are successfully applied on both synthetic and real data, namely for hyperspectral unmixing and extraction of facial features.
arXiv Detail & Related papers (2022-06-21T19:20:35Z)
EOS: a Parallel, Self-Adaptive, Multi-Population Evolutionary Algorithm for Constrained Global Optimization [68.8204255655161]
EOS is a global optimization algorithm for constrained and unconstrained problems of real-valued variables. It implements a number of improvements to the well-known Differential Evolution (DE) algorithm. Results prove that EOSis capable of achieving increased performance compared to state-of-the-art single-population self-adaptive DE algorithms.
arXiv Detail & Related papers (2020-07-09T10:19:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.