Related papers: Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization

Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization

URL: http://arxiv.org/abs/2409.18850v1
Date: Fri, 27 Sep 2024 15:48:39 GMT
Title: Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization
Authors: Vladimír Boža, Vladimír Macko,
Abstract summary: We present Double Sparse Factorization (DSF), where we factorize each weight matrix into two sparse matrices. Our method achieves state-of-the-art results, enabling unprecedented sparsification of neural networks.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Neural networks are often challenging to work with due to their large size and complexity. To address this, various methods aim to reduce model size by sparsifying or decomposing weight matrices, such as magnitude pruning and low-rank or block-diagonal factorization. In this work, we present Double Sparse Factorization (DSF), where we factorize each weight matrix into two sparse matrices. Although solving this problem exactly is computationally infeasible, we propose an efficient heuristic based on alternating minimization via ADMM that achieves state-of-the-art results, enabling unprecedented sparsification of neural networks. For instance, in a one-shot pruning setting, our method can reduce the size of the LLaMA2-13B model by 50% while maintaining better performance than the dense LLaMA2-7B model. We also compare favorably with Optimal Brain Compression, the state-of-the-art layer-wise pruning approach for convolutional neural networks. Furthermore, accuracy improvements of our method persist even after further model fine-tuning. Code available at: https://github.com/usamec/double_sparse.

Related papers

A Proximal Operator for Inducing 2:4-Sparsity [68.98036844970986]
We derive a regularizer that exploits the local correlation of features to find better sparsity masks in trained models. We illustrate our method on toy problems and apply it to pruning entire large language models up to 70B parameters.
arXiv Detail & Related papers (2025-01-29T22:05:17Z)
Scalable iterative pruning of large language and vision models using block coordinate descent [0.31410859223862103]
Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit. We present a neural network pruning technique that builds upon the Combinatorial Brain Surgeon, but solves an optimization problem over a subset of the network weights in an iterative, block-wise manner.
arXiv Detail & Related papers (2024-11-26T17:54:02Z)
Layer-Specific Optimization: Sensitivity Based Convolution Layers Basis Search [0.0]
We propose a new way of applying the matrix decomposition with respect to the weights of convolutional layers. The essence of the method is to train not all convolutions, but only the subset of convolutions (basis convolutions) and represent the rest as linear combinations of the basis ones. Experiments on models from the ResNet family and the CIFAR-10 dataset demonstrate that basis convolutions can not only reduce the size of the model but also accelerate the forward and backward passes of the network.
arXiv Detail & Related papers (2024-08-12T09:24:48Z)
Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance. We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z)
Combinatorial optimization for low bit-width neural networks [23.466606660363016]
Low-bit width neural networks have been extensively explored for deployment on edge devices to reduce computational resources. Existing approaches have focused on gradient-based optimization in a two-stage train-and-compress setting. We show that a combination of greedy coordinate descent and this novel approach can attain competitive accuracy on binary classification tasks.
arXiv Detail & Related papers (2022-06-04T15:02:36Z)
Monarch: Expressive Structured Matrices for Efficient and Accurate Training [64.6871423399431]
Large neural networks excel in many domains, but they are expensive to train and fine-tune. A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones. We propose a class of matrices (Monarch) that is hardware-efficient.
arXiv Detail & Related papers (2022-04-01T17:37:29Z)
Algorithms for Efficiently Learning Low-Rank Neural Networks [12.916132936159713]
We study algorithms for learning low-rank neural networks. We present a provably efficient algorithm which learns an optimal low-rank approximation to a single-hidden-layer ReLU network. We propose a novel low-rank framework for training low-rank $textitdeep$ networks.
arXiv Detail & Related papers (2022-02-02T01:08:29Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Binarization Methods for Motor-Imagery Brain-Computer Interface Classification [18.722731794073756]
We propose methods for transforming real-valued weights to binary numbers for efficient inference. By tuning the dimension of the binary embedding, we achieve almost the same accuracy in 4-class MI ($leq$1.27% lower) compared to models with float16 weights. Our method replaces the fully connected layer of CNNs with a binary augmented memory using bipolar random projection.
arXiv Detail & Related papers (2020-10-14T12:28:18Z)
Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting [60.97465664419395]
We develop a significant and surprising extension of the splitting descent framework that addresses the local optimality issue. By simply allowing both positive and negative weights during splitting, we can eliminate the appearance of splitting stability in S2D. We verify our method on various challenging benchmarks such as CIFAR-100, ImageNet and ModelNet40, on which we outperform S2D and other advanced methods on learning accurate and energy-efficient neural networks.
arXiv Detail & Related papers (2020-03-23T17:09:27Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.