Related papers: Towards Universal & Efficient Model Compression via Exponential Torque Pruning

Towards Universal & Efficient Model Compression via Exponential Torque Pruning

URL: http://arxiv.org/abs/2506.22015v3
Date: Thu, 03 Jul 2025 07:20:35 GMT
Title: Towards Universal & Efficient Model Compression via Exponential Torque Pruning
Authors: Sarthak Ketanbhai Modi, Zi Pong Lim, Shourya Kuchhal, Yushi Cao, Yupeng Cheng, Yon Shin Teo, Shang-Wei Lin, Zhiming Li,
Abstract summary: We propose Exponential Torque Pruning (ETP), which adopts an exponential force application scheme for regularization.<n>ETP manages to achieve significantly higher compression rate than the previous state-of-the-art pruning strategies with negligible accuracy drop.
Score: 2.597821418338574
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid growth in complexity and size of modern deep neural networks (DNNs) has increased challenges related to computational costs and memory usage, spurring a growing interest in efficient model compression techniques. Previous state-of-the-art approach proposes using a Torque-inspired regularization which forces the weights of neural modules around a selected pivot point. Whereas, we observe that the pruning effect of this approach is far from perfect, as the post-trained network is still dense and also suffers from high accuracy drop. In this work, we attribute such ineffectiveness to the default linear force application scheme, which imposes inappropriate force on neural module of different distances. To efficiently prune the redundant and distant modules while retaining those that are close and necessary for effective inference, in this work, we propose Exponential Torque Pruning (ETP), which adopts an exponential force application scheme for regularization. Experimental results on a broad range of domains demonstrate that, though being extremely simple, ETP manages to achieve significantly higher compression rate than the previous state-of-the-art pruning strategies with negligible accuracy drop.

Related papers

Causal Context Adjustment Loss for Learned Image Compression [72.7300229848778]
In recent years, learned image compression (LIC) technologies have surpassed conventional methods notably in terms of rate-distortion (RD) performance. Most present techniques are VAE-based with an autoregressive entropy model, which obviously promotes the RD performance by utilizing the decoded causal context. In this paper, we make the first attempt in investigating the way to explicitly adjust the causal context with our proposed Causal Context Adjustment loss.
arXiv Detail & Related papers (2024-10-07T09:08:32Z)
Adaptive Error-Bounded Hierarchical Matrices for Efficient Neural Network Compression [0.0]
This paper introduces a dynamic, error-bounded hierarchical matrix (H-matrix) compression method tailored for Physics-Informed Neural Networks (PINNs) The proposed approach reduces the computational complexity and memory demands of large-scale physics-based models while preserving the essential properties of the Neural Tangent Kernel (NTK) Empirical results demonstrate that this technique outperforms traditional compression methods, such as Singular Value Decomposition (SVD), pruning, and quantization, by maintaining high accuracy and improving generalization capabilities.
arXiv Detail & Related papers (2024-09-11T05:55:51Z)
Convolutional Neural Network Compression Based on Low-Rank Decomposition [3.3295360710329738]
This paper proposes a model compression method that integrates Variational Bayesian Matrix Factorization. VBMF is employed to estimate the rank of the weight tensor at each layer. Experimental results show that for both high and low compression ratios, our compression model exhibits advanced performance.
arXiv Detail & Related papers (2024-08-29T06:40:34Z)
Towards Meta-Pruning via Optimal Transport [64.6060250923073]
This paper introduces a novel approach named Intra-Fusion, challenging the prevailing pruning paradigm. We leverage the concepts of model fusion and Optimal Transport to arrive at a more effective sparse model representation. We benchmark our results for various networks on commonly used datasets such as CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2024-02-12T17:50:56Z)
CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks [1.5199992713356987]
This paper introduces CompactifAI, an innovative compression approach using quantum-inspired networks. Our method is versatile and can be implemented with - or on top of - other compression techniques. As a benchmark, we demonstrate that a combination of CompactifAI with quantization allows to reduce a 93% memory size of LlaMA 7B.
arXiv Detail & Related papers (2024-01-25T11:45:21Z)
Accelerating Scalable Graph Neural Network Inference with Node-Adaptive Propagation [80.227864832092]
Graph neural networks (GNNs) have exhibited exceptional efficacy in a diverse array of applications. The sheer size of large-scale graphs presents a significant challenge to real-time inference with GNNs. We propose an online propagation framework and two novel node-adaptive propagation methods.
arXiv Detail & Related papers (2023-10-17T05:03:00Z)
Pruning Deep Neural Networks from a Sparsity Perspective [34.22967841734504]
Pruning is often achieved by dropping redundant weights, neurons, or layers of a deep network while attempting to retain a comparable test performance. We propose PQ Index (PQI) to measure the potential compressibility of deep neural networks and use this to develop a Sparsity-informed Adaptive Pruning (SAP) algorithm.
arXiv Detail & Related papers (2023-02-11T04:52:20Z)
Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications. Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure. We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z)
Hardening DNNs against Transfer Attacks during Network Compression using Greedy Adversarial Pruning [0.1529342790344802]
We investigate the adversarial robustness of models produced by several irregular pruning schemes and by 8-bit quantization. We find that this pruning method results in models that are resistant to transfer attacks from their uncompressed counterparts.
arXiv Detail & Related papers (2022-06-15T09:13:35Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
A Unified DNN Weight Compression Framework Using Reweighted Optimization Methods [31.869228048294445]
We propose a unified DNN weight pruning framework with dynamically updated regularization terms bounded by the designated constraint. We also extend our method to an integrated framework for the combination of different DNN compression tasks.
arXiv Detail & Related papers (2020-04-12T02:59:06Z)
Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression. The proposed method automatically induces structured sparsity on the convolutional weights. We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.