MaxQ: Multi-Axis Query for N:M Sparsity Network
- URL: http://arxiv.org/abs/2312.07061v2
- Date: Sun, 17 Mar 2024 03:17:47 GMT
- Title: MaxQ: Multi-Axis Query for N:M Sparsity Network
- Authors: Jingyang Xiang, Siqi Li, Junhao Chen, Zhuangzhi Chen, Tianxin Huang, Linpeng Peng, Yong Liu,
- Abstract summary: MaxQ achieves consistent improvements across diverse CNN architectures in various computer vision tasks.
Experiments show that MaxQ can achieve 74.6% top-1 accuracy on ImageNet and improve by over 2.8% over the state-of-the-art.
- Score: 16.033223841268747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: N:M sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. However, existing N:M sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they directly apply N:M sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:M masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a sparsity strategy that gradually increases the percentage of N:M weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:M soft masks can be precomputed as constants and folded into weights without causing any distortion to the sparse pattern and incurring additional computational overhead. Comprehensive experiments demonstrate that MaxQ achieves consistent improvements across diverse CNN architectures in various computer vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6\% top-1 accuracy on ImageNet and improve by over 2.8\% over the state-of-the-art. Codes and checkpoints are available at \url{https://github.com/JingyangXiang/MaxQ}.
Related papers
- Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - Multicoated and Folded Graph Neural Networks with Strong Lottery Tickets [3.0894823679470087]
This paper introduces the Multi-Stage Folding and Unshared Masks methods to expand the search space in terms of both architecture and parameters.
By achieving high sparsity, competitive performance, and high memory efficiency with up to 98.7% reduction, it demonstrates suitability for energy-efficient graph processing.
arXiv Detail & Related papers (2023-12-06T02:16:44Z) - Learning to Compose SuperWeights for Neural Parameter Allocation Search [61.078949532440724]
We show that our approach can generate parameters for many network using the same set of weights.
This enables us to support tasks like efficient ensembling and anytime prediction.
arXiv Detail & Related papers (2023-12-03T04:20:02Z) - Weight Fixing Networks [0.0]
We look to whole-network quantisation to minimise the entropy and number of unique parameters in a network.
We propose a new method, which we call Weight Fixing Networks (WFN) that we design to realise four model outcome objectives.
arXiv Detail & Related papers (2022-10-24T19:18:02Z) - Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All
You Need [74.58939318994746]
gradient-driven sparsity is used to reduce network complexity.
Weight independence is contrary to the fact that weights are mutually influenced.
We propose to further optimize gradient-driven sparsity (OptG) by solving this independence paradox.
arXiv Detail & Related papers (2022-01-30T14:15:49Z) - KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning [49.77278179376902]
Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as textitcatastrophic forgetting.
Recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets.
We propose a new training method called textit- Kernel-wise Soft Mask (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task.
arXiv Detail & Related papers (2020-09-11T21:48:39Z) - Ternary Feature Masks: zero-forgetting for task-incremental learning [68.34518408920661]
We propose an approach without any forgetting to continual learning for the task-aware regime.
By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them.
Our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
arXiv Detail & Related papers (2020-01-23T18:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.