Related papers: Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

URL: http://arxiv.org/abs/2409.03555v1
Date: Thu, 5 Sep 2024 14:15:54 GMT
Title: Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection
Authors: Ali Aghababaei-Harandi, Massih-Reza Amini,
Abstract summary: We present a unified framework that applies decomposition and optimal rank selection, employing a composite compression loss within defined rank constraints. Our approach includes an automatic rank search in a continuous space, efficiently identifying optimal rank configurations without the use of training data. Using various benchmark datasets, we demonstrate the efficacy of our method through a comprehensive analysis.
Score: 3.3454373538792552
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their high accuracy, complex neural networks demand significant computational resources, posing challenges for deployment on resource-constrained devices such as mobile phones and embedded systems. Compression algorithms have been developed to address these challenges by reducing model size and computational demands while maintaining accuracy. Among these approaches, factorization methods based on tensor decomposition are theoretically sound and effective. However, they face difficulties in selecting the appropriate rank for decomposition. This paper tackles this issue by presenting a unified framework that simultaneously applies decomposition and optimal rank selection, employing a composite compression loss within defined rank constraints. Our approach includes an automatic rank search in a continuous space, efficiently identifying optimal rank configurations without the use of training data, making it computationally efficient. Combined with a subsequent fine-tuning step, our approach maintains the performance of highly compressed models on par with their original counterparts. Using various benchmark datasets, we demonstrate the efficacy of our method through a comprehensive analysis.

Related papers

You Only Train Once [11.97836331714694]
You Only Train Once (YOTO) contributes to limiting training to one shot for the latter aspect of losses selection and weighting.<n>We leverage the differentiability of the composite loss formulation which is widely used for optimizing multiple empirical losses simultaneously.<n>We show that YOTO consistently outperforms the best grid-search model on unseen test data.
arXiv Detail & Related papers (2025-06-04T18:04:58Z)
Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition [11.399520888150468]
We present a theoretically-justified technique termed Low-Rank Induced Training (LoRITa) LoRITa promotes low-rankness through the composition of linear layers and compresses by using singular value truncation. We demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 and ImageNet on Convolutional Neural Networks.
arXiv Detail & Related papers (2024-05-06T00:58:23Z)
Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems. We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z)
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z)
Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning. Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z)
Support Vector Machines with the Hard-Margin Loss: Optimal Training via Combinatorial Benders' Cuts [8.281391209717105]
We show how to train the hard-margin SVM model to global optimality. We introduce an iterative sampling and sub decomposition algorithm that solves the problem.
arXiv Detail & Related papers (2022-07-15T18:21:51Z)
Contextual Model Aggregation for Fast and Robust Federated Learning in Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge. Existing algorithms face issues with slow convergence and/or robustness of performance. We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z)
On Accelerating Distributed Convex Optimizations [0.0]
This paper studies a distributed multi-agent convex optimization problem. We show that the proposed algorithm converges linearly with an improved rate of convergence than the traditional and adaptive gradient-descent methods. We demonstrate our algorithm's superior performance compared to prominent distributed algorithms for solving real logistic regression problems.
arXiv Detail & Related papers (2021-08-19T13:19:54Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
Decentralized Statistical Inference with Unrolled Graph Neural Networks [26.025935320024665]
We propose a learning-based framework, which unrolls decentralized optimization algorithms into graph neural networks (GNNs) By minimizing the recovery error via end-to-end training, this learning-based framework resolves the model mismatch issue. Our convergence analysis reveals that the learned model parameters may accelerate the convergence and reduce the recovery error to a large extent.
arXiv Detail & Related papers (2021-04-04T07:52:34Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.